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Preface 
Preface to "Topics in Applied Probability", a supplementary collection of 
further examples, problems, and m-files related to Applied Probability. 


Topics in Applied Probability is meant to supplement the collection Applied 
Probability with examples, further problems, and m-files. This collection is 
not meant to stand alone but rather serve as a miscellany of further content 
not included in the original collection. Topics discussed in this collection 
include Martingale Sequences and Markov Procedures as well as a few 
others. In addition, the collection of m-files contains both the user-defined 
programs and a collection of m-files for specific problems with properly 
formatted data. 


MATLAB Files 

These files can be entered into MATLAB by calling the appropriate file. 
These m-files come from a variety of sources ( e.g., exams or problem sets, 
hence the odd names) and may be useful for examples and exercises. In 
order to allow for easy access to the helpful m-files, a supplementary link 
has been placed in the upper right side of each module in the "Topics in 
Applied Probability" collection. Selecting this link will launch the 
download of a zip document containing a suite of m-files. Unzipping this 
file and saving its contents to your local system will allow for easy and 
reliable access when working with MATLAB. 


Cost with Price Breaks 
A general pattern for costs with price breaks is set up and an application is made in the case demand D has a 
Poisson distribution. 


Let D be the demand random variable. Suppose there is 


e A flat fee of C for D < a; 

e A cost of c; per unit fora; < D < ay 
e A cost of c> per unit for ag < D< az 
e A cost of c3 per unit for a3 < D 


Then (see figure) 
Equation: 


g(t) = C+ c1Im, (t) (t — a1) + (e2 — c1) Lm, (t) (t — a2) + (c3 — c2)Iu, (t) (( — a3) where M; = [a;, co) 


Note:Since (t — a;) = 0 at t = a;, we could as well have M; = (aj, 00). 


Special Case. If D is Poisson (4), we may obtain an exact solution for E[g(D)] by using the fact that 
E Imo) (D) = P(D>m) and 
Equation: 


fore) oo k 


k 
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As an alternate approach, we approximate D by an appropriate number of values. 
Exercise: 
Example 


Problem: 


A residential College is planning a camping trip over Spring Recess. The number D of persons planning to 
go is assumed to be Poisson (15). Arrangements for transportation and food have been made as follows: 


e If D < 4,a minimal flat fee of $450 will be charged. 
e If4 < D < 19, the additional charge is $100 for each person more than 4 (but less than 20). 
e If 19 < D, the charge is $80 for each person more than 19. 


Thus, C' = 450, a1 = 4, a2 = 19,c; = 100, and cz = 80. The distribution for D is Poisson (15). Then 
Equation: 


Y = g(D) = 450 + 100Zj4,.0) (D) (D — 4) + (80 — 100) Ij19,00) (D) (D — 19) 
Equation: 
= 450 + 100I{4,00) (D)D — 20Ii19,00) (D)D — 400Tj4,.0) (D) + 380I (19,00) (D) 
a. Obtain the “exact” solution to E[g(D)}. 


b. Approximate D by terms up to 40 and obtain E[g(D)] and P(g(D) > v) for v = 1000, 1500, 2000, 
2500. 


Solution: 


mu = 15; 

% Part (a) 

EG = 450 + 100*mu*cpoisson(mu,3) - 20*mu*cpoisson(mu,18)... 
- 400*cpoisson(mu,4) + 380*cpoisson(mu, 19) 
EG = 1.5433e+03 

% Part (b) 

t = 0:40; 

Mics tf Secs 

M2 = t >= 19; 

G = 450 + 100*M1.*(t - 4) - 20*M2.*(t - 19); 
PD = ipoisson(mu,t); 

EGa = dot(G,PD) 

EGa = 1.5433e+03 


P1000 = dot(G >= 1000,PD) 
P1000 = 0.9301 
P1500 = dot(G >= 1500,PD) 
P1500 = 0.5343 
P2000 = dot(G >= 2000,PD) 
P2000 = 0.1248 
P2500 = dot(G >= 2500,PD) 
P2500 = 0.0062 


Order Statistics 
A basic treatment of order statistics. 


In sampling statistics (see Sec 18.7), we deal with an iid class {Xi Lax n} of random variables, 
where n is a prescribed positive integer known as the sample size. An observation of this class gives an n- 
tuple of numbers (¢1, t2,---,¢,). As an extension of the extreme values in the case of two variables (Ex 
2.11), it is often useful to define a random variable Y; whose value for any w is the smallest of the X; (w) 
; a second random variable Y» whose value at w is the next smallest of the X; (w), and so on through Y, 
whose value at w is the largest of the X; (w). We would like to be able to obtain the distributions for 
these new random variables in terms of the common distribution for the X;. We formulate the problem as 
follows. 


Example: 
Order Statistics 
Suppose {X; : 1 <i < n} is iid, with common distribution function F. Let 


e Y; =smallest of X1,X2,...,Xn 
e Y; = next larger of X1, Xo,...,Xn 


O IZ SETAE OF AGL, 2s 25 op 2Ga 


Then Y; is called the kth order statistic for the class {X; : 1 < i < n}. We wish to determine the 
distribution functions F;, (t) = P(Y;, <t)1 <k <n. Now, Y; < tiff k or more of the X; have values 
no greater than t. We may view the process as a Bernoulli sequence of n trials. There is a success on the 
ith trial iff X; < t. The probability p of a success is p = P(X < t) = F(t). Hence 

Equation: 


F, (t) = P(¥, < t) = P(k or more of the X; lie in (— inf,t]) = 520 (n,j)F? (#)[1 — F(t)|"? 
a 


Remark. Once the common distribution function F for the X; is known, then the F; are calculated in a 
straightforward manner. For that purpose we may use the MATLAB function cbinom. 


Example: 
Suppose the X; are exponential (2). Then Fx (t) = 1 — e- for positive t. Suppose n = 5. We calculate 
Ho tort — Os 0) Sues. Oe: 


5; 

0.1:0.2:0.9; 

length(t); 

1 - exp(-2*t); 

for i = 1:m 

FK(i,:) = cbinom(n,F(i),1:n); 


disp([t' F' FK]) %k=1 k=2 k=3 k=4 k=5 


0.1000 0.1813 0.6321 0.2249 0.0445 0.0046 0.0002 
0.3000 0.4512 0.9502 0.7456 0.4091 0.1324 0.0187 
0.5000 0.6321 0.9933 0.9354 0.7364 0.3946 0.1009 
0.7000 0.7534 0.9991 0.9852 0.9000 0.6400 0.2427 
0.9000 0.8347 0.9999 0.9968 0.9653 0.8064 0.4052 


The following special case is important in characterizing the Poisson process (see Sec 21.1). 


Example: 

Order statistics for uniformly distributed random variables 

Suppose {U; : 1 <2 < n} is iid, uniform on (0, T]. Determine the distribution functions for the order 
statistics. 

SOLUTION 

The common distribution function for the U; is given by F(t) = t/T, O<t< T. According to the 
result in Ex 2.16, the kth order statistic Y; has the distribution function 

Equation: 


F(t) = P(Y¥e <t) = Yrema(e) (7+) <t<T 


Martingale Sequences: The Concept, Examples, and Basic Patterns 

The notion of martingales and related concepts seem to have originated in studies of games of chance. Certain 
patterns were identified and extended to more general sequences of random variables. The resulting abstract 
theory provides a basis for many applications, both theoretical and practical. 


The concept, examples, and basic patterns 


A classical example 


The notion of martingales and related concepts seem to have originated in studies of games of chance similar to 
the following. Suppose 


To 6 


Y,, =a gambler's “gain” on the nth play of a game 
Yo = the original capital or “bankroll” 


n 
Set X,, = 0 forn < 0,X, = >, Y,Vn > 0. Thus, X,, is the capital after n plays, and 
k=0 
Equation: 


Yo = Xo Yn = Xn — Xn-1 VYn>0 
Put U, = (Xo, X1, ---, Xn) and V, = (Yo, Y%1, ---, Yn). Foranyn € N, Up = gn (Vz) and 
V, = hy (U;,) or, equivalently, o (U,) = 0 (V,,). Hence E |H|U,] = E |A|V,] a.s. 


If Yy is an independent class with E[Y,,] = OVn > 1, the game is considered fair. In this case, we have by 
(CES), (CE7Z), and hypothesis 
Equation: 


E[Xn41|Un] = E [Ynti|Vn] + E[Xn|Un] = E[Ynsi] + Xn = Xn as. 


Also E[Xnai — Xn|Un] = E[Ynui|Vn] =0 as. 


Gamblers seek to develop a “system” to improve expected earnings. We examine some typical approaches and 
show their futility. To keep the analysis simple, consider a simple coin-flipping game. Let 


A, = event of a “head” on the kth component trial 
T;, = H, = event of a “tail” on the kth component trial 


The player has a system. He decides how much to bet on each play from the pattern of previous events. Let 
B, [In, — Ir,] = BnZn be the result of the nth play, where |B,,| is the amount of the bet; B,, > 0 indicates a 
bet on ahead; B, < 0 indicates a bet on a tail; B = 0 indicates a decision not to bet. 


Systems take various forms; here we consider two possibilities. 


1. The amount of the bet is determend by the pattern of outcomes of previous tosses 
Equation: 


Bn = 9n-1 (In, In, ees: Ty,_.) Yn = BrZn Zn = Ty, = Ir, = 217, = 1 


2. The amount bet is determined by the pattern of previous payoffs 
Equation: 


B,, = 9n-1 (Yi, Yo, res Y=t) = Rn-1 (Bi, Tx,; as By, w,;) Y, = BrZn 


Let Yo = Xo = C, aconstant. Since C is independent of any random variable, E[H|C] = E[H]. In either 
scheme, by (CE8), (C15), and the fact E[Z;,] = 0 
Equation: 


BE [Yn4i|Vn] =E [Bn+1Zn41|Pn (Bi, tes tie B,In,)| = Brayk [Zn+1| =Oas. 
It follows that 
Equation: 
E[Xn4i|Un] = E [Yne=ilvn] + E[Xn|Un] =0+ Xp as. 
The “fairness” of the game is not altered by the betting scheme, since decisions must be based on past 


performance. In spite of simple beginnings, the extension and analysis of these patterns form a major thrust of 
modern probability theory. 


Formulation of the concept 
In the following treatment, 


Xn = {X,: n € N} is the basic sequence N = {0, 1, --- } 
Yn = {Y,: n € N} is the incremental sequence 


Equation: 
Yn =Xn-Xni Xn=>S Ve n>0, [Xn =0 <0] 
k=0 
We suppose Zy is a decision sequence and Xn ~ Zy; that is, Xn = gn (Wn) = 9n (Zo, Z1, --+, Zn). 


bs XN ~ ZN iff Yn ~ ZN 
° If Xn ~ An and An ~ Zn, then Xn ~ Zn. In particular, if Hy, = kn (Un) = kn (Xo, X1, +--+, Xn), 
then Xx ~ Hn. 


Definition. If Xy is integrable and Zy is a decision sequence, then 


1. Xy is a martingale (MG) relative to Zy iff 
Equation: 


Xn ~ Zp and F[Xn4|W,] =X, as. VneN 


2. Xv is a submartingale (SMG) relative to Zy iff 
Equation: 


Xn ~ Zn and E[Xnui|Wr] > Xnas. VneN 


3. Xy is a supermartingale (SRMG) relative to Zy iff 
Equation: 


Xn ~ Zn and E[Xnui|Wr] <Xnas. VneEeN 


Notation. When we write (Xn, Zn) is a martingale (submartingale, supermartingale), we are asserting Xj is 
integrable, Zy is a decision sequence, Xx ~ Zy, and Xy is a MG (SMG, SRMG) relative to Zy. 


Definition. If Yy is integrable and Zy is a decision sequence, then 


1. Yy is absolutely fair relative to Zy iff 
Equation: 


Yn~ Zp and E[Y,4|W,] =0as. VneN 


2. Yy is favorable relative to Zy iff 
Equation: 


Yn~ Zn and E[Ynui|W,] >0as. VneN 


3. Yy is unfavorable relative to Zy iff 
Equation: 


Yn~ Zn and EF[Ynui|Wr]<O0as. VneN 


Notation. When we write (Yn, Zn) is absolutely fair (favorable, unfavorable), we are asserting Yj is 
integrable, Zy is a decision sequence, Yn ~ Zn, and Yjy is absolutely fair (favorable, unfavorable) relative to 
Zn. LXA2-2 

IXA2-1 


If Xy is a basic sequence and Yj is the corresponding incremental sequence, then 


1. (Xn, Zn) is a martingale iff (YN, Zn) is absolutely fair. 
2. (Xn, Zn) is a submartingale iff (YN, Zn) is favorable. 
3. (Xn, Zn) is a supermartingale iff (Yn, Zn) is unfavorable. 


Let * be any one of the symbols =, >, or <. Then by linearity and (CE7) 
Equation: 


E[Xns1|Wa] = E[Yne1|Wa] + E[Xn|Wa] = E[Ynsi|We) + Xn*Xn as. iff E[YpuilW,]*0 as. 


Remarks 


i (XN, Zn) is a SMG iff (—Xn, Zn) isaSRMG 

2. We write (S)MG to indicate the same statement can be made for a MG or a SMG with the appropriate 
choice of = or > 

3. We write (>) to indicate simultaneously two cases: 


O° 


(>) read as = in all places (for a MG) 
o (>) read as > in all places (for a SMG) 


Some Basic Patterns 
IXA3-1 
If (Xn, Zn) is a(S)MG and Xy ~ Hy, with Hy ~ Zn, then (Xn, Hn) is a (S)MG. 


Let K,, = (Ho, Mi, ---, Hn). By (CE9), the (S)MG definition, monotonicity, and (CE7) 
Equation: 


E[Xn+ilKn] = B{E[XnealWallKn} (2) E[Xn|Kn] =Xn as. 
IXA3-2 
For integrable Xx ~ Zy, the following are equivalent 


(Xn, Zn) isa (S)MG 
E[Xniz|Wr] (>) Xn as. Vo n,keEN 
ElIcXn4| (>) EllcX,] V Ceao(W,) Vo nen 
E [oXn+k] (>) ElIcXy] V Ce a (W,,) Vn ken 


ao oo 8 


b aas a special case 
a bBy (CE9), (a), and monotonicity 
Equation: 


E [Xnizl|Wr] _ E{E [Xn+e|Wr+e1]|Wn} (>) E [Xn+e—11W,,] a. 5S. 


k — 1 iterations yield E [|Xn+xn|Wn] (>) Xn a.s. 

dcas a special case 

c aBy (CE1) and (c), EF IcXn41] = E {IcE [Xn+i|Wn]} (>) E[IcXn] a.s.. Since Xn ~ Wp a.s. 
and F [X,41|W,] ~ W,, a.s., the result follows from the uniqueness property (E5), 

¢ b dBy (CE1), (b), and monotonicity EF [Ico Xn+%] = E {(IcE |Xnizk|Wn]} (=) E[LcXn!] 


We thus haved >c>aSb>d 
IXA3-3 


If (Xn, Zn) is a(S)MG, then B[Xnys] (2) B[Xn] (2) B [Xo] 
IXA3-4 


(Xn, Zn) is a(S)MG iff E[X, — X,|W,] (>)0as. Vn<p<qeN 
EXERCISE. Note X,— Xp=Ypiit --- + Yj 


IXA3-2 
IXA3-5 


If (Xn, Zn) is an L? MG, then 


E|X, — X,| =0 Vv p<qeN 
E[Xn (Xq— Xp)| =0 Y nx p<¢qeN 
E|(Xn — Xm) (Xq — Xp)| = 0 Ym<n<p<qeNn 
E[X,X,] = EB [Xp Vv pPaEN 
E|(X,-X,)"|=#[X3]-E[x}]>0 ¥  p<qeN 
B [x2] = DL) BIA v pen 


a. E[X, — X,| = E{E[X, — X,|W,,]} = 0 by (CE1b) and Thm IXA3-4 

b. E[X,, (X, — X,)] = E{X,,E[X, — X,|W,]} = 0 by (CE1b), (CE8), and Thm IXA3-4 

c. As inb, since X, — Xm ~ Wy 

d. Suppose p < q. Then, since Xp ~ Wy, E[XpXjq] = E{X,E[X,|W,]} = E |X? ] by definition of MG. 
For q < p, interchange p, qg in the argument above. 

e. E|(Xq— Xp)”| = B[X2] - 2B[XpXq + E[X}] = B[X}] - 2B [X32] + B[X3] bya, above 


Pp : Pp 
f. By c, B[Y;Y,] = 0 for j Ak. Hence, E[X2] =E (x: “| SoS) EY = 0 F [YZ] 


jk k=0 


A variety of weighted sums of increments are useful. 
IXA3-6 


Suppose (Xj, Zn) is a (S)MG and Yy is the incremental sequence. Let Ho be a (nonnegative) constant and let 


HH, ~ W,y-1, n = 1, be bounded (nonnegative). Set 
Equation: 


X= So mY; = Svy, VneN 
k=0 k=0 


Then (Xj, Zn) isa (S)MG. 
E[Ynii1|Wa] = ElAn+1Yn+1|Wnl = Hn+iE [Yn+i|Wn] a.s. by (CE8) 
For MG case: E [You |W] = 0 a.s. for arbitrary bounded H,, 


For SMG case: B[You |W] >0a.s. for H, > 0, bounded 


The conclusion follows from [link] 


Remark. This result extends the pattern in the introductory gambling example. [link] IXA3-3 
IXA3-7 


In Theorem IXA3-6, if H[Xo] > 0and0 < H,, < la.s.Vn EN, then0 < # Ee 


n 


| < E[X,])vn €N 


E\Yn41|Wn] > Ani E [Yni1|Wn] (=) 0 a.s., by hypothesis, and 
Ani FE [Ynui|W,] = £ [Y,. “4 |W] a.s., by (CE8). Thus, by monotonicity and (CE1b) 
Equation: 


* 


E[Ynu] (>) E [Yn (>)0 VneEN and E[Y¥] = E[Xo] > WoL [Yo] = £ [¥o | 


Hence 
Equation: 


Some important special cases 
IXA3-8 


Suppose integrable Xy ~ Zy. If Xn41 — Xn (>)0a.s.Vn EN, then (Xn, Zn) is a (S)MG. 


Apply monotonicity and Theorem IXA3-4 
IXA3-9 


Suppose Xj has independent increments. 


a. If E[X,] = c, invariant with n, then Xy is a MG. 
b. If B[|Xnii — Xn] (>)0, Vn EN, then (Xn is a (S)MG. 


b. For any n, consider any C € o(U,,). By independent increments, {I¢, (Xn+1 — Xn)} is independent. 
Hence, EF [I¢Xn41] — E [IcXn] = E [Ic (Xnai — Xn)| = Ec] E [(Xn+1 — Xn)] (=)0. The desired 
result follows from Theorem IXA3-2(c), 


IXA3-10 
Suppose g is a convex Borel function on an interval I which contains the range of all X,, and 
Ellg(Xn)|] < Vn EN, Let H, = 9(X,)Vn EN, 


a. If (Xn, Zn) isa MG, then (Hn, Zn) is a SMG. 
b. If g is nondecreasing and (Xn, Zn) is a SMG, then so is (Hn, Zn) 


e a. By Jensen's inequality and the definition of a MG 
Equation: 


Eg (Xni1)|Wa) = 9 ElXnti|Wnl) = 9 (Xn) a-s. 


e b. By Jensen's inequality 
Equation: 


E[g(Xna1)|Wn] > 9(E|Xnaa|Wn]) a8. 


Since FE |Xni1|W,] > X, a.s. and g is nondecreasing, we have 
Equation: 


GV E|Xn+i|Wr]) > g(Xn) as. 


Some commonly utilized convex functions 


1. g(t) = |t| 

2. g(t) = t’g is increasing for t > 0 

3. g(t) = u(t)t g(Xn) = X7g nondecreasing for all t 
4, g(t) = —u(—t)tg (X,) = X,g nonincreasing for all t 
5. g(t) =e”, a> Og is increasing for all t 


IXA3-11 


Consider integrable Xy ~ Zn. 


* 1 * 
alf B[Xnii1|W,] = aX,a.s.Vn and X, = —X,Vn, then (Xy, Zn) isa MG 
a” 


* 1 * 
b. If E[Xnii|Wr] > aXna.s.,a > 0,Vn and X,, = —X,,Vn, then (Xy, Zn) isa SMG 
a” 
Equation: 
E [X44 |Wa] = 


1 * 
wee [Xn+ilWr| (>) grt oXn = X,, a.S. 


The restriction] a > 0 is needed in the > case. 


Martingale Sequences: Examples and Further Patterns 


Examples and further patterns 


A4-1 Sums of Independent Random Variables 


Suppose Yy is an independent, integrable sequence. Set X,, = S> Y, Vn>0. 
k=0 


If E[Y,] (=) 0 Vn > 1, then Xy is a (S)MG. 
A4-2 Products of nonnegative random variables 


Suppose Yn ~ Zn, Y, > Oa.s. Vn. Consider Xy : Xy, = c][™ c>0. 
k=0 


If EB [Y¥n41|Wn] (=) 1 a.s. Vn, then (Xn, Zn) is a (S)MG 


Xy~ W,, and Xni1 = Yn41Xn. Hence, FE [Xnyi|Wr] = Xn E[Znii|Wn] (2) Xn as. Vn 
A4-3 Discrete random walk 


Consider Yo = O and {Y,,: 1 < n} iid. Set X, = ALD > 0. Suppose P(Y,, = k) = pg. Let 
k=0 
Equation: 


gy (s)=E [s**] = So pxs*, s>0 
k 


Now gy (1) =1, gy (1)=E [Ya], gy (s) = 5>k(k—1)pxs*? > 0 for s > 0. Hence, gy (s) = 1 has at 
k 
most two roots, one of which is s = 1. 


a. § = Lis a minimum point iff E/Y,,] = 0, in which case Xy is a MG (see A4-1) 


b. If gy (r) = 1 for0 <r < 1, then B[r’] = 1Vn > 1. Let Zy = 1,2, =r = | [ r%. By A4-2, Zyisa 
k=1 


MG 
For the MG case in Theorem IXA3-6, the Y,, are centered at conditional expectation; that is 


E\Yn41|W,,] = 0 a.s. The following is an extension of that pattern. 
A4-4 More general sums 


Consider integrable Yy ~ Zy and bounded Hy ~ Zy. Let W,, = a constant forn < 0 and H, = 1 forn < 0. 


Set 
Equation: 


Xn = S> {Yi _ E|Y;,|We_1|} Hr—1 Vn > 0 
k=0 


Then (Xn, Zn) is a MG. 
Xp ~ W,3Vn > Oand E[Xy41|Wr] = Xn + BaF {Yaw — E[Yn|Wal|Wr} = Xn + 0a-s. 


IXA4-2 


A4-5 Sums of products 


Suppose Yy is absolutely fair relative to Zy, with E||Ynl|*] < ooVn, fixed k > 0. Forn > k, set 
Equation: 


a ae eZ 


1rtg** 
O0<t1<---<n 


LY; 


Uk 


Then (Xn,, 2n,) Nz = {k, hk +1, k+2,--- } isa MG, 


Xnyi = Xn + Kn41, where 
Equation: 


Ken =Yeu >) YaYa Yau =Younk, Ki ~ We 


0<ij<---<n 


Equation: 


E[Kns1|Wo] = K,E[You|We] =O0a.8. VYn>k 


We consider, next, some relationships with homogeneous Markov sequences. 


Suppose (Xn, Zn) is a homogeneous Markov sequence with finite state space # = {1, 2, ---, M} and 
transition matrix P = [p(2, 7)]. A function f on E is represented by a column matrix 

f=([f (1), f (2), --:, f(M)]". Then f(X,) has value f(k) when X, = k. Pf isan m x 1 column matrix and 
Pf (J) is the jth element of that matrix. Consider E [f (Xn+1)|Wn] = E [f (Xn+1)|Xn] a.s.. Now 

Equation: 


E[f (Xni1)|Xn = J] = D0 F(R)p (Gk) = Pf (3) so that Elf (Xn+1)|Wn] = Pf (Xn) 
kcE 


A nonnegative function f on E is called (super)harmonic for P iff Pf (<) f. 
A4-6 Positive supermartingales and superharmonic functions. 


Suppose (Xj, Zn) is a homogeneous Markov sequence with finite state space H = {1, 2, ---, M} and 
transition matrix P = [p(i, 7)]. For nonnegative f on E, let Y, = f (Xn) Vn © N. Then (Yn, Zn) isa positive 
(super)martingale P(SR)MG iff fis (super)harmonic for P. 


As noted above FE [f (Xn+1)|Wn] = Pf (Xn). 


1. If fis (super)harmonic Pf (Xn) (<) f (Xn) = Yn, so that 
Equation: 


EY 4 | Wel (S) Kg a8: 


2. If (Yn, Zn) is a P(SR)MG, then 
Equation: 


Yn, = f (Xn) (©) Elf (Xna1)|Wr] = Pf (Xn) a.s., so that f is (super)harmonic 


IX A4-3 


An eigenfunction f and associated eigenvalue A for P satisfy Pf = Af (i-e., (AI — P)f = 0). In most cases, 
|A| < 1. For real A, 0 < A < 1, the eigenfunctions are superharmonic functions. We may use the construction of 
Theorem IXA3-12 to obtain the associated MG. 

A4-7 Martingales induced by eigenfunctions for homogeneous Markov sequences 


Let (Yn, Zn) be a homogenous Markov sequence, and f be an eigenfunction with eigenvalue A. Put 
X, = "Ff (Y,). Then, by Theorem JAXA3-12, (Xn, Zn) isa MG. 
A4-8 A dynamic programming example. 


We consider a horizon of N stages and a finite state space E = {1, 2,---, M}. 


e Observe the system at prescribed instants 
e Take action on the basis of previous states and actions. 


Suppose the observed state is j and the action is a € A. Two results ensue: 


1. A return r(j, a) is realized 
2. The system moves to a new state 


Let: 

Y,, = state in nth period, O<n<N —1 

A, = action taken on the basis of Yo, Ag, ---, Yn-1, An-1, Yn 
[Ag is the initial action based on the initial state Yo] 


A policyr is a set of functions (79, 71, -+-,N_1), such that 
Equation: 


Ay = in (Yo; Ag, +++, Yn-1; An-1, Y,,) 0<n<N-1 


The expected return under policy 7, when Yp = jo is 
Equation: 


The goal is to determine 7 to maximize R(7, jo). 


Let Z, = (Y;, Ax) and W, = (Zo, Z1, +--+, Zn). If {Y¥, : O< & < N — 1} is Markov, then use of (C19) and 
(C111) shows that for any policy the Z-process is Markov. Hence 
Equation: 


E [Im (Yn+1)| Wn] = E [Im (Yn+1)| Zn] as. Vn: 0<n<N-1, V Borel sets M 


We assume time homogeneity in the sense that 
Equation: 


P (Yn = 9/Yn =, An = 0) = p(jlt, a), invariant withn, Vi,j © E, VacA 


We make a dynamic programming approach 


Define recursively fy, fv_i,---, fo as follows: 


fn (j) = 0,Vj € E. Forn = N, N —1,---,2,1, set 


Equation: 
fn—1 (j) =max {ri a)+ > fn (k ip (klj, a) : cal 
kcE 
Put 
Equation: 


Xn = So fe (Va) — Efe (Ye) Meal} 
k=1 


Then, by A4-2, (Xn, Zn) is a MG, with E[X,] =0, 0<n< Nand 
Equation: 


fn—1 (Yn-1) > r (Yaa » An-1) + SO fn (bp (h|Zn—1) =7 (Yn » Ant) + E [fn (Yn)|Wn-1] 


kcE 


IX A4-4 
We may therefore assert 
Equation: 

N N 
0= E[Xy] = a(S {fr (Ye) — E [fe wim) > a(S {fie (Ye) +7 (Ya-1, Ani) — fr-1 (Ye-1)} 

k=1 k=1 
Equation: 

N-1 N-1 

-»| r (Yu, Az ) + fn (Yn) — fo (Yo })- = aba Yr, Az )| — E[fo (Yo)] 
k=0 k=0 


Hence, R(m, Yo) < E[fo (Yo )]. For Yo = jo, R(x, jo) < fo (Jo). Ifa policy 2° can be found which yields 
equality, then m’ is an optimal policy. 


The following procedure leads to such a policy. 


e For each j € E, let T1 (Yo, Ao, Yi, Ai, +--+, An—2,J) = T1 (3) be the action which maximizes 
Equation: 


)+ So fn (k)p (RJ, 2) = 7 (5, 2) + E[fn (Yn)[¥n1 = 5, Ana =a] 
keE 
Thus, A, = 1, (Y,). 
* Now, fn—1(Yn-1) =? (Yn-1, An_1) — E[f, [fn (Yn)|Zn—1]+ which yields equality in the argument above. 
Thus, R (x : i= fo (j) and nm” is optimal. 


Note that mt” is a Markov policy, A = 7, (Y,,). The functions f,, depend on the future stages, but once determined, 
the policy is Markov. 
A4-9 Doob's martingale 


Let X be an integrable random variable and Zy an arbitrary sequence of random vectors. For each n, let 
X, = E[X|W,]. Then (Xn, Zn) is a MG. 
Equation: 


E\|Xn |] = E{|E[X|Wr] |} < E{E||X||Wn |} = E[|X|] < 00 
Equation: 

E[Xn4i1|Wn] = E{E[X|Wrsi]|Wn} = E[X|W,] = Xp a.s. 
A4-9a Best mean-square estimators 


If X € L?, then X, =E [X|W,,] is the best mean-square estimator of X, given W,, = (Zo, Z1, ---, Zn). 
(Xn, Zn) isa MG. 
A4-9b Futures pricing 


Let Xj be a sequence of “spot” prices for a commodity. Let tg be the present and tg + T' be a fixed future. The 
agent can be expected to know the past history U;, = (Xo, Xi, --+, Xz,), and will update as t increases beyond 
to. Put Y, = E[X;,,7|U;,+14], the expected futures price, given the history up to ty) +k. Then {Y,: O0< kK <T} 
is a Doob's MG, with Y = X,,+7, relative to {Z, : 0 < k < T}, where Zp = U;, and Z, = Xz,+4 for 
1<k<T. 

A4-9c Discounted futures 


Assume rate of return is r per unit time. Then a = 1/(1 +1) is the discount factor. Let 
Equation: 


VYya=E [aT *X1.47|Vi.+4] =a" *Y; 
Then 
Equation: 


E [Ve+1|Ut,+4] = a7 "EF [Ying |Ut,44] = a? * 1, > aT FY, = Vi, as. 


Thus {V,: 0<k < T} isa SMG relative to {Z,: O<k< Th}. 


Implication from martingale theory is that all methods to determine profitable patterns of prediction from past 
history are doomed to failure. 


IX A4-5 
A4-10 Present discounted value of capital 


If a = 1/(1 +1) is the discount factor, X,, is the dividend at time n, and V,, is the present value, at time n, of all 
future returns, then 
Equation: 


oo oe) [oe] 

k k k-1 

a ; a" Xniz so that Vrs = ; a" Xniky1 = ; aX" Xnik 
k=1 k=1 k=2 


Equation: 


1 [oe 
= — Slat Xnte = Xa = (14+ r)Va — Xa 
O FI 


Note that Visi (>) Vn iffr (>) Xnai/Vn. Set Y, = E[V,,|U,]. Then 
Yau = (14+ r)E [Vn |On+i] — Xn+1 a.s. so that 
Equation: 


Thus, (Yn, Xn) is a (S)MG iff 
Equation: 


E{Xn41|Un] _ Expected return next period, given U,, 


r (2) = 


E\Vn|Un] Expected present value, given U,, 


Summary: Convergence of Submartingales 
The submartingale convergence theorem 


n 


If (Xx, Zn) is a SMG with lim E [x7] < oo, then there exists X,, ~ W,, such that X,, > X, a.s. 


Uniform integrability and some convergence conditions 


Definition. The class {X; : t € T} is uniformly integrable iff 
Equation: 


sup {E[I,x,|>a}|Xe|]:  € T} 30 asa — oo 


Any of the following conditions ensures uniform integrability: 
1. The class is dominated by an integrable random variable Y. 
2. The class is finite and integrable. 
3. There is au.i. class {Y; : ¢ € T} such that |X; |<|Y;| a.s. for allt € T. 
4. X integrable implies Doob's MG {X,, = E[X|W,,]: n € N} is ui. 


Definition. The class {X,; : t € T} is uniformly absolutely continuous iff for each € > 0 there is a 6 > 0 such 
that P(A) < 6 implies sup {EL04] Xe :teTl<e. 
T 


Xp = {X;: t € T} is ui. iff both (i) X7 is u.a.c., and (ii) sup {E[|X¢|]} < co. 
T 


P 
Definition. X,, > X iff P(|X, — X| > €) > asn— oo foralle > 0. 
Equation: 


X, 3X iff El]X,—X|?] > 0asn— oo 


P 
X,— X a.s. implies X, > X 


If (i) X, z X, (ii) X, > X a.s. and (iii) lim E'[X,,|Z] exists a.s., then 
Equation: 


lim E|X,,|Z] = E[X|Z] as. 


Suppose (Xn, Zn) is a (S)MG. Consider the following 


° (A) lim E [Xz] < co or, equivalently, sup E[|Xp |] < 00 ------------ 
n 


¢ (a) Xy is uniformly integrable. 
¢ (a) Xq is uniformly integrable. 


¢ (b) X, — X. 
e (b) X77 > X. 


n 
¢ (c) There is an integrable X.. ~ W. such that 


Equation: 


X, > Xo as. and E[X—|Wr] (>) Xnas. VneN 


¢ (c) Condition (c) with > even for a MG. 
¢ (d) There is an integrable X with EF [X|W,,] (>) Xn a.s. VneN 
e (d) Condition (d) with > even for a MG. 


Then 


1. Each of the propositions (a) through (d’) implies (A), hence SMG convergence. 
2. (a) => (a*) 

3.6 (b> O= 

4a) (7) Se (Cc)S @) 

5. For a MG, (d) => (a), so that (a) & (b) S (c) > (d) 


The notion of regularity is characterized in terms of the conditions in the theorem. 


Definition. A martingale (X~, Zn) is said to be martingale regular iff the equivalent conditions (a), (b), (c), (d) 
in the theorem hold. 


A submartingale (Xj, Zn) is said to be submartingale regular iff the equivalent conditions (a*), (b*), (c’), (d’) 
in the theorem hold. 


Remarks 


1. Since a MG is a SMG, a martingale regular MG is also submartingale regular. 

2. It is not true, in general that a submartingale regular SMG is martingale regular. We do have for SMG (a) = 
(b) => (c) > (d). 

3. Regularity may be viewed in terms of membership of X,, in the (S)MG. The condition 
E |X x|Wn] (>)Xna.s. is indicated by saying X,, belongs to the (S)MG or by saying the (S)MG is closed 
(on the right) by Xx. 


Summary 
For a martingale(Xjx, Zn) 


1. If martingale regular, then X,, > X,. ~ W.a.s. and X,, = E[X,.|W,,]a.s.Vn © N 
E[Xnsk|Wr] = ELE [X0|Wr+e||Wr} = EF [X0|Wr] = Xna.s. and 
E|Xo] = E [Xp] = E[X~|Vn €N 

2. If submartingale regular, but not martingale regular, then X, > X. ~ Wa.s. but 
E|X0|Wn] > Xna.s.Vn € N and E [Xo] = E [Xz] < E[Xa] < oovn EN 


For a submartingale(Xn, Zn) 


Either martingale regularity or submartingale regularity implies 


Xn Xo ~ Wo a.s. and Xp < E[Xnsi|Wn] < E[X~|Wr] as. VnEeN 


and £ [Xo] < E[Xn] < E[X~|<oo Vn EN 
If Xy is uniformly integrable, then E [X;,| > E [Xoo]. 


If (Xx, Zn) is a MG with E[X2] < KVniN, then the proceess is MG regular, with 
Equation: 


Xn —> Xoo as. E|(Xo — Xn)’] +0 and E[X,]=E[Xx] VneN 


A Reorder problem-- Electronic Store 

As an example of a reorder problem we consider an electronic store which reorders a 
certain type of unit on a regular weekly basis. Two possible actions: 1. Order two 
units, at a cost of $150 each; 2. Order four units, at a cost of $120 each. An optimal 
demand policy, based on sales the previous week is sought. 


A reorder problem 


Example: 

An electronic store stocks a certain type of DVD player. At the end of each week, an 
order is placed for early delivery the following Monday. A maximum of four units is 
stocked. Let the states be the number of units on hand at the end of the sales week: 
E = {0, 1, 2, 3, 4}. Two possible actions: 


e Order two units, at a cost of $150 each 
e Order four units, at a cost of $120 each 


Units sell for $200. If demand exceeds the stock in hand, the retailer assumes a 
penalty of $40 per unit (in losses due to customer dissatisfaction, etc.). Because of 
turnover, return on sales is considered two percent per week, so that discount is 

a = 1/1.02 ona weekly basis. 

In state 0, there are three possible actions: order 0, 2, or 4. In states 1 and 2 there are 
two possible actions: order 0 or 2. In states 3 and 4, the only action is to order 0. 
Customer demand in week n + 1 is represented by a random variable D,,.1. The 
class is iid, uniformly distributed on the values 0, 1, 2, 3, 4. If X;, is the state at the end 
of week n, then {X,,, D,,.1} is independent for each n. 

Analyze the system as a Markov decision process with type 3 gains, depending upon 
current state, action, and demand. Determine the transition probability matrix PA 
(properly padded) and the gain matrix (also padded). Sample calculations are as 
follows: 


State 0, action 0: Poo (0) = 1 (all otherpo; (0) = 0 
StateO; action 2). pon ( 2 (D2 2) — 3/5. pon 2) =D — 1) = 1/5. eta. 
State 2, action 2: Da, (kh) =) oy kh — OF 253, 4 
For state = i, action = a, and demand = k, we seek g(z, a, k) 
g(0,0,k) = —40k 0 -40 -80 -120 -160 
9(0, 2, k) = —300 + 200 min {k, 2} — 40 max {k — 2,0} -300 -100 100 60 20 
g(0, 4, &) = —480 + 200k -480 -280 -80 120 320 


1. Complete the transition probability table and the gain table. 


2. Determine an optimum infinite-horizon strategy with no discounting. 

3. Determine an optimum infinite-horizon strateby with discounting (alpha = 
1/1.02). 

4. The manager decides to set up a six-week strategy, after which new sales 
conditions may be established. Determine an optimum strategy for the six-week 
period. 


Data file 


% file orderdata.m 
% Version of 4/5/94 
% Data organized for computation 
type = 3, 
states = 0:4; 
= [0 2 4... % Actions (padded) 
0 2 02 
0 2 02 
© 00 00 . 
0 00 00]; 
= [0 -300 -480 ... % Order costs (padded) 
-300 -300 
-300 -300 
O aoe 
0]; 
200; % Selling price 
40; % Backorder penalty 
0.2*ones(1,5); % Demand probabilities 


OO0000 


ep) 
U 


ie) 
vU 
iu ioe 


U 
iw} 


Transition Probabilities and Gains 


The procedure 


% file reorder.m 

% Version of 4/11/94 

% Calculates PA and GA for reorder policy 
states = input('Enter row vector of states '); 


A = input('Enter row vector A of actions (padded) ‘'); 

C = input('Enter row vector C of order costs (padded) '); 

D = input('Enter row vector D of demand values '); 

PD input('Enter row vector PD of demand probabilities '); 


SP = input('Enter unit selling price SP '); 
= input('Enter backorder penalty cost BP '); 
length(states'); 
length(A); 
a = q/m; 
length(D); 
ones(na,1)*states; 


S(:)'; 


[d,s] = meshgrid(D,S); 
a = A'*ones(1,N); 


ca = C'*ones(1,N); 

TA =(s+a-d).*(S +a-d >= 0); 
for i = 1:q 

PA(i,:) = tdbn(states, TA(i,:),PD); 
end 

PA 


GA = ca + SP*d - (SP + BP)*(d -s -a).*(d > Sta) 
The calculations 


orderdata 

reorder 

Enter row vector of states states 

Enter row vector A of actions (padded) A 

Enter row vector C of order costs (padded) C 
Enter row vector D of demand values D 

Enter row vector PD of demand probabilities PD 
Enter unit selling price SP SP 

Enter backorder penalty cost BP BP 


PA = 

1.0000 0 0 0 0) 
0.6000 0.2000 0.2000 0 0) 
0.2000 0.2000 0.2000 0.2000 0.2000 
0.8000 0.2000 0 0 0) 
0.4000 0.2000 0.2000 0.2000 0 
0.4000 0.2000 0.2000 0.2000 0 
0.6000 0.2000 0.2000 0 0) 
0.2000 0.2000 0.2000 0.2000 0.2000 
0.2000 0.2000 0.2000 0.2000 0.2000 
0.4000 0.2000 0.2000 0.2000 0 
0.4000 0.2000 0.2000 0.2000 0 
0.4000 0.2000 0.2000 0.2000 0 
0.2000 0.2000 0.2000 0.2000 0.2000 
0.2000 0.2000 0.2000 0.2000 0.2000 
0.2000 0.2000 0.2000 0.2000 0.2000 
GA = 


0 -40 -80 -120 -160 
-300 -100 100 60 20 
-480 -280 -80 120 320 
0 200 160 120 380 
-300 -100 100 300 260 
-300 -100 100 300 260 
0 200 400 360 320 
-300 -100 100 300 500 
-300 -100 100 300 500 
200 400 600 560 
200 400 600 560 
200 400 600 560 
200 400 600 800 
200 400 600 800 
200 400 600 800 


OOO000 


Infinite-horizon strategy (no discounting) 


polit 

Data needed: 

Enter case number to show gain type case 

Enter row vector of states states 

Enter row vector A of possible actions A 

Enter value of alpha (= 1 for no discounting) 1 
Enter matrix PA of transition probabilities PA 

Enter matrix GA of gains GA 

Enter row vector PD of demand probabilities PD 

Index Action Value 


OOO0O000 


15 400 

Initial policy: action numbers 
21111 

Policy: actions 

2000 0 


New policy: action numbers 
S22 a 

Policy: actions 

42200 


Long-run distribution 

0.2800 0.2000 0.2000 0.2000 0.1200 
Test values for selecting new policy 
Index Action Test Value 


1.0000 0 -248 .0000 
2.0000 2.0000 -168 .8000 
3.0000 4.0000 -41.6000 
4.0000 0 -48.8000 
5.0000 2.0000 -5.6000 

6.0000 2.0000 -5.6000 

7.0000 0 131.2000 


8.0000 2.0000 138.4000 
9.0000 2.0000 138.4000 
10.0000 0) 294.4000 
11.0000 0) 294.4000 
12.0000 0) 294.4000 
13.0000 0) 438.4000 
14.0000 0 438.4000 
15.0000 0) 438.4000 
Optimum policy 

State Action Value 

0) 4.0000 -168 .0000 
1.0000 2.0000 -132.0000 
2.0000 2.0000 12.0000 
3.0000 0) 168.0000 
4.0000 0) 312.0000 
Long-run expected gain per period G 
126.4000 


Infinite-horizon strategy (with discounting) 


polit 


Data needed: 


type number to show gain type 

row vector of states states 

row vector A of possible actions A 

value of alpha (= 1 for no discounting) 1/1.02 
matrix PA of transition probabilities PA 
matrix GA of gains GA 

row vector PD of demand probabilities PD 


Index Action Value 


1 0 -80 
2 -44 

3 -80 

4 0 112 

2 oe 

6 2 52 

7 0 256 
8 2 100 
9 2 100 
10 0 352 
11 0 352 
12 0 352 


13 0 400 


14 © 400 

15 0 400 

Initial policy: action numbers 
2. eh A 

Policy: actions 

20000 


New policy: action numbers 

S 2°24. 

Policy: actions 

42200 

Test values for selecting policy 
Index Action Test Value 


1.0e+03 * 

0.0010 0 6.0746 
0.0020 0.0020 6.1533 
0.0030 0.0040 6.2776 
0.0040 0 6.2740 
0.0050 0.0020 6.3155 
0.0060 0.0020 6.3155 
0.0070 0 6.4533 
0.0080 0.0020 6.4576 
0.0090 0.0020 6.4576 
0.0100 0 6.6155 
0.0110 0 6.6155 
0.0120 0 6.6155 
0.0130 0 6.7576 
0.0140 0 6.7576 
0.0150 0 6.7576 


Optimum policy 
State Action Value 
.0e+03 * 

0.0040 6.2776 
.0010 0.0020 6.3155 
.0020 0.0020 6.4576 
.0030 0 6.6155 
.0040 0 6.7576 


OO0O00FR 


Finite-horizon calculations 


dpinit 
Initialize for finite horizon calculations 
Matrices A, PA, and GA, padded if necessary 


Enter case number to show gain type case 

Enter vector of states states 

Enter row vector A of possible actions A 

Enter matrix PA of transition probabilities PA 
Enter matrix GA of gains GA 

Enter row vector PD of demand probabilities PD 
Call for dprog 

dprog 

States and expected total gains 

0 1 2 3 4 

-44 112 256 352 400 

States Actions 

0 2 


3 0 

4 0 

dprog 

States and expected total gains 

0 1.0000 2.0000 3.0000 4.0000 
135.2000 178.4000 315.2000 478.4000 615.2000 
States Actions 

0 4 


3 0 

4 0 

dprog 

States and expected total gains 

0 1.0000 2.0000 3.0000 4.0000 
264.4800 300.4800 444.4800 600.4800 744.4800 
States Actions 

0 4 

1 2 

2 2 

3 0 

4 0 

dprog 

States and expected total gains 

1.0000 2.0000 3.0000 4.0000 

390.8800 426.8800 570.8800 726.8800 870.8800 
States Actions 


fo) 4 
1 2 
2 2 
3 fo) 
4 fC) 

dprog 


States and expected total gains 
0 1.0000 2.0000 3.0000 


4.0000 


517.2800 553.2800 697.2800 853.2800 997.2800 


States Actions 


BRWNEF © 
GOGONN SFA 


dprog 
States and expected total gains 
1.0e+03 * 
0) 0.0010 0.0020 0.0030 0.0040 
0.6437 0.6797 0.8237 0.9797 1.1237 
States Actions 


BRWNE © 
OGOONN SA 


Markov Decision -- Type 3 Gains 


Example: 

An electronic store stocks a certain type of VCR. At the end of each week, an order is placed for 
early delivery the following Monday. A maximum of four units is stocked. Let the states be the 
number of units on hand at the end of the sales week: Eb = {0, 1, 2, 3, 4} . Two possible 
actions: 


e Order two units, at a cost of $150 each 
e Order four units, at a cost of $120 each 


Units sell for $200. If demand exceeds the stock in hand, the retailer assumes a penalty of $40 per 
unit (in losses due to customer dissatisfaction, etc.). Because of turnover, return on sales is 
considered two percent per week, so that discount is a = 1/1. 02 on a weekly basis. 

In state 0, there are three possible actions: order 0, 2, or 4. In states 1 and 2 there are two possible 
actions: order 0 or 2. In states 3 and 4, the only action is to order 0. Customer demand in week 
n+ 1 is represented by a random variable D,,,1. The class is iid, uniformly distributed on the 
values 0, 1, 2, 3, 4. If X, is the state at the end of week n, then eX Drv} is independent for 
each n. 

Analyze the system as a Markov decision process with case 3 gains, depending upon current state, 
action, and demand. Determine the transition probability matrix PA (properly padded) and the 
gain matrix (also padded). Sample calculations are as follows: 


e State 0, action 0: poo (0) = 1 (all other pox (0) = 0) 
« State 0, action 2: poo (2) = P(D > 2) = 3/5, pn (2) = P(D = 1) = 1/5, ete. 
¢ State 2, action 2: po;(k) = 1/5,k = 0,1, 2, 3, 4 


For state = i, action = a, and demand = k, we seek g(i, a, k) 


g(0,0,k) = —40k 0 40 80 —120 —160 
g(0, 2,&) = —300 + 200 min {k, 2} — 40 max {k — 2,0} -—300 -—100 100 60 20 
g(0, 4, k) = —480 + 200k 480 —280 —80 120 320 


1. Complete the transition probability table and the gain table. 

2. Determine an optimum infinite-horizon strategy with no discounting. 

3. Determine an optimum infinite-horizon strateby with discounting (alpha = 1/1.02). 

4. The manager decides to set up a six-week strategy, after which new sales conditions may be 
established. Determine an optimum strategy for the six-week period. 


Data file 
% file orderdata.m 
% Version of 4/5/94 


% Data organized for computation 


ee Oe ees 
0 2 02 
0 2 02 
© 00 00 
© 00 60]; 
C = [0 -300 -480 
© -300 -300 
© -300 -300 
0 0 0 


SP = 200; 
BP = 40; 
penalty 


PD = 0.2*ones(1,5); 


Transition Probabilities and Gains 
The procedure 


% file reorder.m 
% Version of 4/11/94 


% Actions (padded) 


% Order costs (padded) 


% Selling price 


% Backorder 


% Demand probabilities 


% Calculates PA and GA for reorder policy 


states = input('Enter row vector of states re 

A = input('Enter row vector A of actions (padded) a 

C = input('Enter row vector C of order costs (padded) aa ie 

D = input('Enter row vector D of demand values 2 

PD = input('Enter row vector PD of demand probabilities a 
SP = input('Enter unit selling price SP y Me 

BP = input('Enter backorder penalty cost BP a 


m = length(states'); 


q = length(A); 

na = q/m; 

N = length(D); 

S = ones(na,i)*states; 

S = S(:)'; 

[d,s] = meshgrid(D,S); 

a = A'*ones(1,N); 

ca = C'*ones(1i,N); 

TA = (Ss ta -d).*(s ta -d >= 0); 
for i = 1:q 

PA(i,:) = tdbn(states, TA(i,:),PD); 
end 

PA 


GA = ca + SP*d - (SP + BP)*(d -s -a).*(d > sta) 


The calculations 


orderdata reorder Enter row vector of states states Enter row 
vector A of actions (padded) A Enter row vector C of order costs 
(padded) C Enter row vector D of demand values D Enter row vector 
PD of demand probabilities PD Enter unit selling price SP SP Enter 
backorder penalty cost BP BP PA = 1.0000 0 0 0 0 0.6000 0.2000 


0 


(Ooo) (Oo) (o} (o) 


. 2000 
4000 
. 2000 
. 2000 
. 2000 
. 2000 
. 2000 
-480 -280 -80 


0 


0. 


. 2000 
. 2000 


© 0.2000 6.2000 0.2000 0.2000 0.2000 0.3000 


0.2000 0.2000 0.2000 0 0.4000 0.2000 0.2000 O. 
0.2000 
OQ. 
0 
0 


© 0 68.2000 0.2000 0.2000 0.2000 0.2000 
2000 0.2000 0.4000 0.2000 0.2000 0.2000 0 O. 

0 0.4000 0.2000 0.2000 0.2000 0 0.2000 

0 


0.2000 
2000 O 
0.2000 
4000 0. 
0.2000 


is) (6) 6 
0.6000 
0.2000 
2000 

0.2000 


.2000 0.2000 0.2000 0.2000 0.2000 0.2000 0.2000 


2000) 0.2000 GA = 0 -40) -80 -120) -160) -300 -100 10060 20 


126) °320' 0 200 160 120 80 -300 -100 100 300 260° =-300 


-106 100 300 260 0 200 400 360 320 -300 -100 100 300 500 -300) -100 
100 300 500 0 200 400 600 560 0 200 400 600 560 0 200 400 600 560 


0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 


Infinite-horizon strategy (no discounting) 


po 


Lit 


Data needed: 


type number to show gain type type 

row vector of states states 

row vector A of possible actions A 

value of alpha (= 1 for no discounting) 1 
matrix PA of transition probabilities PA 
matrix GA of gains GA 


Enter row vector PD of demand probabilities 


Index 


15 


Action Value 
0 -80 
2 -44 
4 -80 
0 112 
2 52 
2 52 
0 256 
2 100 
2 100 
0 352 
0 352 
0 352 
0 400 
0 400 
0 400 
Initial policy: action numbers 
2 1 1 1 
Policy: actions 
2 0 0 0 
New policy: action numbers 
3 2 2 1 
Policy: actions 
4 2 2 0 
Long-run distribution 
0.2800 0.2000 0.2000 
Test values for selecting new policy 
Index Action Test Value 
1.0000 0 -248 .0000 
2.0000 2.0000 -168.8000 
3.0000 4.0000 -41.6000 
4.0000 0 -48.8000 
5.0000 2.0000 -5.6000 
6.0000 2.0000 -5.6000 
7.0000 0 131.2000 
8.0000 2.0000 138.4000 
9.0000 2.0000 138.4000 
10.0000 0 294.4000 
11.0000 0 294.4000 
12.0000 0 294.4000 
13.0000 0 438.4000 
14.0000 0 438.4000 
15.0000 0 438.4000 
Optimum policy 
State Action Value 
0 4.0000 -168.0000 


PD 


0.1200 


1.0000 2.0000 -132 .0000 


2.0000 2.0000 12.0000 
3.0000 0 168.0000 
4.0000 0 312.0000 


Long-run expected gain per period G 
126.4000 


Infinite-horizon strategy (with discounting) 


polit 

Data needed: 

Enter case number to show gain type type 
Enter row vector of states states 

Enter row vector A of possible actions A 
Enter value of alpha (= 1 for no discounting) 
Enter matrix PA of transition probabilities 
Enter matrix GA of gains GA 

Enter row vector PD of demand probabilities 


Index Action Value 
1 0 -80 
2 2 -44 
3 4 -80 
4 0 112 
5 2 52 
6 2 52 
7 0 256 
8 2 100 
9 2 100 
10 0 352 
11 0 352 
12 0 352 
13 0 400 
14 0 400 
15 0 400 
Initial policy: action numbers 
2 1 1 1 
Policy: actions 
2 0 0 0 
New policy: action numbers 
3 2 2 1 
Policy: actions 
4 2 2 0 
Test values for selecting policy 
Index Action Test Value 
1.0e+03 * 


0.0010 0 6.0746 


PA 


PD 


1/1.02 


0.0020 0.0020 6 
0.0030 0.0040 6 
0.0040 0 6 
0.0050 0.0020 6 
0.0060 0.0020 6 
0.0070 0 6 
0.0080 0.0020 6 
0.0090 0.0020 6 
0.0100 0 6 
0.0110 0 6 
0.0120 0 6 
0.0130 0 6 
0.0140 0 6 
0.0150 0 6 
Optimum policy 

State Action Value 
1.0e+03 * 

0 0.0040 6.277 
0.0010 0.0020 6.315 
0.0020 0.0020 6.457 
0.0030 0 6.615 
0.0040 0 6.757 


Finite-horizon calculations 


dpinit 


1533 
.2776 
.2740 
.3155 
.3155 
.4533 
.4576 
.4576 
.6155 
.6155 
.6155 
. 7576 
. 7576 
. 7576 


6 
5 
6 
5 
6 


Initialize for finite horizon calculations 
Matrices A, PA, and GA, padded if necessary 


Enter type number to show gain 
Enter vector of states states 


type 


type 


Enter row vector A of possible actions A 
Enter matrix PA of transition probabilities 


Enter matrix GA of gains GA 


Enter row vector PD of demand probabilities 


Call for dprog 


dprog 
States and expected total gains 
0 1 2 3 
-44 112 256 352 
States Actions 
0 2 
1 0 
2 0 
3 0 
4 0 
dprog 


States and expected total gains 
0 1.0000 2.000 


0 


400 


3.0000 


PA 


PD 


4.0000 


135.2000 178.4000 315.2000 


States Actions 
4 


BRWNEF © 
OONN 


dprog 


States and expected total gains 
0 1.0000 2.0000 
264.4800 300.4800 444.4800 


States Actions 


BRWNHER © 
OONNSA 


dprog 


States and expected total gains 
0 1.0000 2.0000 
390.8800 426.8800 570.8800 


States Actions 


0 4 

1 2 

2 2 

3 0 

4 0 
dprog 


States and expected total gains 
0 1.0000 2.0000 
517.2800 553.2800 697.2800 


States Actions 


States and expected total gains 


0 4 

1 2 

2 2 

3 0 

4 0 

dprog 

1.0e+03 * 
0 
0.6437 


States Actions 


RWNER © 
OONNSA 


0.0010 
0.6797 


478.4000 


3.0000 
600.4800 


3.0000 
726.8800 


3.0000 
853.2800 


0.0020 
0.8237 


615.2000 


4.0000 
744.4800 


4.0000 
870.8800 


4.0000 
997.2800 


0.0030 
0.9797 


0.0040 
1.1237 


MATLAB Calculations for Decision Models 
Additional examples of Matlab calculations for decision problems (see Matlab Procedures for Markov 
Decision Processes). 


Data 


There are three types. In all types, we need the following: 


A = the vector of actions (1 x m) m =the number of actions 
PH: PHY) =P( A =1;) (1 x s) s =the number of values of H 
PXE + PXE G, 37) = P(X =a =w). (3% q) q =the number of values of X 


e Type 1 The usual type. In addition to the above, we need 
L = ([L (a, yx)| (m xn) m =the number of actions 
PYH: PYH(i,k)|)=P(Y =yz|H =u;) (s xn) mn =the number of values of Y 
¢ Type 2 The matrix RH = |r(a,71)| is given. L and PY H are not needed. 
¢ Type 3 Sometimes Y = H. In this case RH = L, which we need, in addition to the above. 


Calculated quantities 


1.RH =[r(a,t)]| (mx) eS function = expected loss, given H] 
r(a, i) = E[L(a, Y)|H =u] 2s (a, k)P(Y = y,|H = u;) MATLAB: RH = L*PYH' 


2. PX (1 x q) PX (j) = P(X =2;) >) P(H = u)P(X = j|H = ui) 


MATLAB: PX = PH*PXH 
3.PHX (q x s) 
[a,b] = meshgrid(PH,PX) PHX = PXH'.*a./b 
4.RX =|R(a, j)} (mxq) — [Expected risk, given X] 
hile, 9) =2 ira, A) xX =a,|= eae (a, 1)P(H = u;|X = 2;) MATLAB: RX = RH*PHX' 
Select d* from RX: d* (7) is the action a (row number) for minimum expected loss, given X = j. 
Set D = [a" Qa 2), 2 (q)]. 
. Calculate the a. risk BD for d’. 
BD=E R (a (x y= Ds RX (D(j), 3)PX (j) MATLAB: RD*PX' 


un 


(sp) 


Note: Actions are represented in calculations by action number (position in the matrix). In some cases, 
each action has a value other than its position number. The actual values can be presented in the final 
display. 


file dec.m 


% file dec.m 
% Version of 12/12/95 
disp('Decision process with experimentation' ) 


disp('There are three types, according to the data provided. ') 
disp('In all types, we need the row vector A of actions, ') 
disp('the row vector PH with PH(i) = P(H = u_i),') 

disp('the row vector X of test random variable values, and') 
disp('the matrix PXH with PXH(i,j) = P(X = x_j|H = u_i).') 
disp('Type 1. Loss matrix L of L(a,k)') 


disp(' Matrix PYH with PYH(i,k) = P(Y = y_k|H = u_i)') 
disp('Type 2. Matrix RH of r(a,i) = E[L(a,Y)]H = u_i].') 
disp(' L and PYH are not needed for this type.') 


disp('Type 3. Y =H, so that only RH = L is needed.') 


c = input('Enter type number’ '); 
A = input('Enter vector A of actions '); 
PH = input('Enter vector PH of parameter probabilities '); 
PXH = input('Enter matrix PXH of conditional probabilities '); 
X = input('Enter vector X of test random variable values '); 
s = length(PH); 
q = length(X); 
if c == 
L = input('Enter loss matrix L '); 
PYH = input('Enter matrix PYH of conditional probabilities '); 
RH = L*PYH'; 
elseif c == 
RH = input('Enter matrix RH of expected loss, given H '); 
else 
L = input('Enter loss matrix L '); 
RH =L; 
end 
PX = PH*PXH; % (1 x s)(S xX q) = (1 xX q) 
[a,b] = meshgrid(PH, PX); 
PHX = PXH'.*a./b; % (q xX S) 
RX = RH*PHX'; % (mM x S)(S X q) = (m x q) 
[RD D] = min(RX); % determines min of each col 


% and row on which min occurs 
S = [X; A(D); RD]'; 
BD RD*PX'; % Bayesian risk 
[' Optimum losses and actions']; 
[' Test value Action Loss']; 


=> 
Wout il 


disp(['Bayesian risk B(d*) = ',num2str(BD), ]) 


Example: 
General case 


% file deci.m 

% Data for Problem 22-11 

type = 1, 

A = [10 15]; % Artificial actions list 
PH = [0.3 0.2 0.5]; % PH(i) = P(H = i) 

PXH = [0.7 @.2 @.1; % PXH(i,j) = P(X = jJH= i) 


0.2 0.6 0.2; 
0.1 0.1 0.8]; 
ee [ede one 
L=[1 0 -2; % L(a,k) = loss when action number is 
is k 
ee aly ech 


a, 


outcome 


PYH = [0.5 0.3 0.2; % PYH(i,k) = P(Y = k|H = i) 
0.2 0.5 0.3; 
0.1 0.3 0.6]; 

deci 

dec 


Decision process with experimentation 
There are three types, according to the data provided. 
In all types, we need the row vector A of actions, 
the row vector PH with PH(i) = P(H = i), 
the row vector X of test random variable values, and 
the matrix PXH with PXH(i,j) = P(X = j|JH =i). 
Type 1. Loss matrix L of L(a,k) 

Matrix PYH with PYH(i,k) = P(Y = k|H = i) 
Type 2. Matrix RH of r(a,i) = E[L(a,Y)]H = i]. 

L and PYH are not needed in this case. 
Type 3. Y =H, so that only RH = L is needed. 
Enter type number’ type 
Enter vector A of actions A 
Enter vector PH of parameter probabilities PH 
Enter matrix PXH of conditional probabilities PXH 
Enter vector X of test random variable values X 
Enter loss matrix L L 
Enter matrix PYH of conditional probabilities PYH 


Optimum losses and actions 
Test value Action Loss 
-1.0000 15.0000 -0.2667 
0 15.0000 -0.9913 
1.0000 15.0000 -2.1106 


Bayesian risk B(d*) = -1.3 


Intermediate steps in solution of Example 1, to show results of various operations 


RH 

RH = 0.1000 -0.4000 -1.1000 
0.4000 -1.1000 -2.4000 

PX 

PX = 0.3000 0.2300 0.4700 

a 

a = 0.3000 0.2000 0.5000 
0.3000 0.2000 0.5000 
0.3000 0.2000 0.5000 

b 

b = 0.3000 0.3000 0.3000 
0.2300 0.2300 0.2300 
0.4700 0.4700 0.4700 

PHX 


PHX = 0.7000 0.1333 0.1667 
0.2609 0.5217 0.2174 
0.0638 0.0851 @.8511 


RX = -0.1667 -0.4217 -0.9638 
-0.2667 -0.9913 -2.1106 


Example: 
RA given 


% file dec2.m 
% Data for type in which RH is given 


type = 2; 

A= [1 2]; 

1 Gg Real cle 

PH = [0.2 0.5 0.3]; 

PXH = [0.5 0.4 0.1; % PXH(i,j) = P(X = jJH = i) 
0.4 0.5 0.1; 
0.2 0.4 0.4]; 

RH = [-10 5 -12; 


5 -10 -5]; % y(a,i) = expected loss when 
% action is a, given H =i 


dec2 

dec 

Decision process with experimentation 

woe eee eee -------- Instruction lines edited out 
Enter type number type 

Enter vector A of actions A 

Enter vector PH of parameter probabilities PH 
Enter matrix PXH of conditional probabilities PXH 
Enter vector X of test random variable values X 
Enter matrix RH of expected loss, given H RH 


Optimum losses and actions 
Test value Action Loss 
-1.0000 2.0000 -5.0000 
1.0000 2.0000 -6.0000 
3.0000 1.0000 -7.3158 


Bayesian risk B(d*) = -5.89 


Example: 

Example 3 

Carnival example (type in which Y = A) 

A carnival is scheduled to appear on a given date. Profits to be earned depend heavily on the weather. If 
rainy, the carnival loses $15 (thousands); if cloudy, the loss is $5 (thousands); if sunny, a profit of $10 
(thousands) is expected. If the carnival sets up its equipment, it must give the show; if it decides not to 


set up, it forfeits $1,000. For an additional cost of $1,000, it can delay setup until the day before the 
show and get the latest weather report. 

Actual weather H = Y is 1 rainy, 2 cloudy, or 3 sunny. 
The weather report X has values 1, 2, or 3, corresponding to predictions rainy, cloudy, or sunny 
respectively. 

Reliability of the forecast is expressed in terms of P(X = j|H = i)—see matrix PX H 

Two actions: 1 set up; 2 no set up. 

Possible losses for each action and weather condition are in matrix L. 


% file dec3,m 
% Carnival problem 


type = 3; % Y =H (actual weather) 
A= [1 2]; % 1: setup 2: no setup 
Ks ils 2 sie % 1; rain, 2: cloudy, 3: sunny 
L = [16 6 -9; % L(a,k) = loss when action number is a, outcome 
is k 
22 2p % --with premium for postponing setup 
PH = 0.1*[1 3 6]; % P(H = i) 
PXH = 0.1*[7 2 1; % PXH(i,j) = P(X = j|H = i) 
2 2 
i ara A 
dec3 
dec 


Decision process with experimentation 
------------------- Instruction lines edited out 
Enter case number’ case 

Enter vector A of actions A 

Enter vector PH of parameter probabilities PH 
Enter matrix PXH of conditional probabilities PXH 
Enter vector X of test random variable values X 
Enter loss matrix L L 


Optimum losses and actions 

Test value Action Loss 
1.0000 2.0000 2.0000 
2.0000 1.0000 1.0000 
3.0000 1.0000 -6.6531 


Bayesian risk B(d*) = -2.56 


Matlab Procedures for Markov Decision Processes 

We first summarize certain essentials in the analysis of homogeneous Markov chains with costs and rewards 
associated with states, or with transitions between states. Then we consider three cases: a. Gain associated with a 
state; b. One-step transition gains; and c. Gains associated with a demand under certain reasonable conditions. 
Matlab implementations of the results of analysis provide machine solutions to a variety of problems. 


In order to provide the background for Matlab procedures for Markov decision processes, we first summarize 
certain essentials in the analysis of homogeneous Markov chains with costs and rewards associated with states, or 
with transitions between states. References are to Pfeiffer: PROBABILITY FOR APPLICATIONS. 


1. Some cost and reward patterns 
Consider a finite, ergodic (i.e., recurrent, positive, aperiodic, irreducible) homogeneous Markov chain Xj, 
with state space E = {1, 2,---, M}. To facilitate use of matrix computation programs, we number states 
from one, except in certain examples in which state zero has a natural meaning. Associated with this chain is 
a cost or reward structure belonging to one of the three general classes described below. The one-step 
transition probability matrix is P = [p(i, 7)], where p (i, 7) = P (Xni1 = j|Xn = 7). The distribution for 
X,, is represented by the row matrix 7 (n) = [pi (n), p2 (n),---, pa (n)], where p; (n) = P(X, = 1). The 
long-run distribution is represented by the row matrix 7 = |71,72,---, 7m]. 


a. Type 1. Gain associated with a state. A reward or gain in the amount g(?) is realized in the next period 
if the current state is i. The gain sequence{G,, : 1 < n} of random variables G41 = g(Xn) is the 
sequence of gains realized as the chain Xj evolves. We represent the gain function g by the column 
matrix g = [9 (1)g9(2)---9(M)]”. 

b. Type 2. One-step transition gains. A reward or gain in the amount g (2, 7) = gj; is realized in period 
n+ Lif the system goes from state i in period n to state j in period n + 1. The corresponding one-step 
transition probability is p (2, 7) = p;j;. The gain matrix is g = [g(i, j)]. The gain sequence 
{G,, : 1 <n} of random variables Gri = g(Xn,Xn4+1) is the sequence of gains experienced as the 
chain Xj evolves. 

c. Type 3. Gains associated with a demand. In this case, the gain random variables are of the form 
Equation: 


Gri = g (Xn, Dn+1) 


If n represents the present, the random vector U, = (Xo, X1,-+-, Xn) represents the “past” at n of the 
chain Xy. We suppose {D,, : 1 < n} is iid and each {D,,41, Un} is independent. Again, the gain 
sequence{G,, : 1 < n} represents the gains realized as the process evolves. Standard results on 
Markov chains show that in each case the sequence Gy = {Gr :1l< n} is Markov. 


A recurrence relation. We are interested in the expected gains in future stages, given the present state of the 


process. For any n > 0 and any m > 1, the gain ree in the m periods following period n is given by 
Equation: 


on = Grit se Gni2 apes tay Grim 


If the process is in state i, the expected gain in the next period q; is 
Equation: 


Hao” = Bera XeSi 


and the expected gain in the next m periods is 
Equation: 


N 


We utilize a recursion relation that allows us to begin with the transition matrix P and the next-period 
expected gain matrixq = |qiq2°-- am| and calculate the v™) for any m > 1. Although the analysis is 


somewhat different for the various gain structures, the result exhibits a common pattern. In each case 
Equation: 


m-1 
UW =E jor” cere i = E[Gn11\Xn =i] + > E[GasesilXn = 4] 
k=1 
Equation: 
m-1 
=r a Ss E[Grreul|Xn+1 = Jp (i, 9) 
k=1 jek 
Equation: 
m-1 
= Git SCE bs Gntk|Xn = i]nt9 
jek k=1 


We thus have the fundamental recursion relation 
Equation: 


(ol = gi + Sov p(i,3) 


jek 


The recursion relation (*) shows that the transition matrix P = [p(i, 7)| and the vector of next-period 


expected gains, which we represent by the column matrix q = [qi, 42,°--, QM ie determine the ul”, The 
difference between the cases is the manner in which the q; are obtained. 


o Type 1. qi = E [9 (Xn)|Xn = @] = g(%) 

o Type 2. qi = E [9 (Kins Xn41)|Xn = i| =E (9 (4, Xn41)|Xn = i| = jen 9 (i, 5)P (1,9) 

o Type 3. gi = Eg (Xn; Dnsi)|Xn = i] = Elg (i, Dnsi)] = Oy, 9 (i, k)P(D = h) 

o Matrix form. For computational purposes, we utilize these relations in matrix form. If 
Equation: 


T 
v(n) = of dys Le | and q = |qiq2°°: au] 

then 

Equation: 


(*)v(m +1) =q+Pv(m) for all m>0, with v(0) =0 


Examination of the expressions for q;, above, shows that the next-period expected gain matrix q takes the 
following forms. In each case, g is the gain matrix. In case c, pp is the column matrix whose elements are 
P(D =k). 


° Typelq=g 
o Type2q= the diagonal of Pg” 
© Type 3q = gPp 


. Some long-run averages 


Consider the average expected gain for m periods 
Equation: 


ee) 


B| =a =< Pe dong Pel Gn+z|Xn-i]} 


Use of the Markov property and the fact that for an ergodic chain 
Equation: 


1 m 
see k(; 4 : 

y p’ (t,j) > mj} as m— co 
= 


yields the result that, 
Equation: 


jim, B| a ole DiPs Xn-1 = 1) ayn = > tj = 9 
j j 


and for any state i, 


Equation: 
Lm) 

lim E ies, meee =lim —v; ‘=g9 

m—-0o mo m 
The expression for g may be put into matrix form. If the long-run probability distribution is represented by 
the row matrix 7 = [7172 ---my] and q = [qiq2---; amu’, then 
Equation: 

g=7q 


. Discounting and potentials 


Suppose in a given time interval the value of money increases by a fraction a. This may represent the 
potential earning if the money were invested. One dollar now is worth 1 + a dollars at the end of one period. 
It is worth (1 + a)” dollars after n periods. Set a = 1/(1+4 a), so that 0 < a < 1. It takes a dollars to be 
worth one dollar after one period. It takes a” dollars at present to be worth one dollar n periods later. Thus 
the present worth or discounted value of one dollar n periods in the future is @” dollars. This is the amount 
one would need to invest presently, at interest rate a per unit of time, to have one dollar n periods later. If f is 
any function defined on the state space E, we designate by f the column matrix [f (1) f (2) --- f (M)]”. We 
make an exception to this convention in the case of the distributions of the state probabilities 

a(n) = [pi (n)p2 (n)--- py (n)], their generating function, and the long-run probabilities 

TS [7172 tee TM]; which we represent as row matrices. It should be clear that much of the following 
development extends immediately to infinite state space E = N = {0, 1, 2,---}. We assume one of the gain 
structures introduced in Sec 1 and the corresponding gain sequence {G, : 1 < n}. The value of the random 
variable G+; is the gain or reward realized during the period n + 1. We let each transition time be at the 
end of one period of time (say a month or a quarter). If n corresponds to the present, then G+, is the gain in 
period n + k. If we do not discount for the period immediately after the present, then a*-!G,,, is the 
present value of that gain. Hence 

Equation: 


oe) 
peace is the total discounted future gain 
k=1 


Definition. The a-potential of the gain sequence {G,, : 1 < n} is the function g defined by 
Equation: 


(oe) 


p(i)=E bs a"Gns41 


n=0 


Xp =1/Vie Ek 


Remark. y(1) is the expected total discounted gain, given the system starts in state i. We next define a 
concept which is related to the a-potential in a useful way. Definition. The a-potential matrixR® for the 
process Xj is the matrix 


Equation: 
[oe 
R®=') a°P* with, P?=1 
n=0 
3.1) 


Let Xy be an ergodic, homogeneous Markov chain with state space E and gain sequence {G,, : 1 < n}. Let 


qa =[aim---au)" where g; = E[Gn+1|Xn = iJfori € E. For a € (0, 1), let g be the a-potential for the 
gain sequence. That is, 


Equation: 
CoO 
y (i) = B|Y0%Gu Xj=i|VicE 

n=0 
Then, if R° is the a-potential matrix for Xy, we have 
Equation: 

p=R°q 

3.2 


Consider an ergodic, homogeneous Markov chain Xy with gain sequence {G,, : 1 < n} and next-period 
expected gain matrix q. For a € (0, 1), the a-potential g and the a-potential matrix R® satisfy 
Equation: 


[I — aP]R® = Iand [I- aP|y=q 
If the state space E is finite, then 
Equation: 


R? = [I— aP] ‘andy = [I— aP]"'q= R°q 


Example: 

A numerical example 

Suppose the transition matrix P and the next-period expected gain matrix q are 
Equation: 


OP2ZMONS MONS 2 
IPs 0.4 Oil Ob) eel oq = 15 


Oab Ges 0ut 3 
For a = 0.9, we have 
Equation: 
4.4030 2.2881 3.3088 30.17 
R°? = (I—0.9P)7' = ]3.5556 3.1356 3.3088] and y = R°%q = | 32.72 
3.6677 2.2881 4.0441 30.91 


The next example is of class c, but the demand random variable is Poisson, which does not have finite range. 
The simple pattern for calculating the q; must be modified. 


Example: 

Costs associated with inventory under an (m, M) order policy. 
If k units are ordered, the cost in dollars is 

Equation: 


oe 0 k=0 
eee la (ey cl ezy seca 


For each unit of unsatisfied demand, there is a penalty of 50 dollars. We use the (m, M) inventory policy 
described in Ex 23.1.3. Xo = M, and X, is the stock at the end of period n, before restocking. Demand in 
period n is D,,. The cost for restocking at the end of period n + 1 is 

Equation: 


10 +25(M — Xp) + 50 max {Dni1—_M,0} 0<Xn<m 


Xn, Dy = 
a +) ie max {Dni1— Xn,0} m<X,<M 


For m = 1, M = 3, we have 
Equation: 


g (0, Dy) = 85-- 501/3 50) (Dn+1) (Dr4s a 3) 
Equation: 
g(t, Dn41) = BOL 3,00) (Dn+1) (Diet oe i)1 <1<3 


We take m = 1, M = 3 and assume Dis Poisson (1). From Ex 23.1.3 in PA, we obtain 
Equation: 


0.0803 0.1839 0.3679 0.3679 
0.6321 0.3679 0 0 
0.2642 0.3679 0.3679 0 
0.0803 0.1839 0.3679 0.3679 


The largest eigenvalue |A| ~ 0.26, so n > 10 should be sufficient for convergence. We use n = 16. 
Taking any row of P!°, we approximate the long-run distribution by 


Equation: 
m = [0.2858 0.2847 0.2632 0.1663] 


We thus have 
Equation: 


qo = 85+ 50EF [I13,00) (Dnit) (Dns - 3)| = 85+ 50 Me (k—- ae ( the term for k = 3 is xo) 
k= 


oe) [oe) 
For the Poisson distribution soy, = oe Dk: 
k=n k=n-1 


Hence gg = 85 + 50 


ya. 2 Pi = 85 + 50 (0.0803 — 3 x 0.0190] = 86.1668 
k=3 k=4 


q1 = 50> (k—1)p, = 50 yon = Sn = 50p2 = 9.1970 
k=3 k=2 k=3 
qo = 50S. (k — 2)px = 50 [0.2642 — 2 x 0.0803] = 5.1809 q3 = go — 85 = 1.1668 so that 
Ro) 
Equation: 
86. 1668 
_ | 9.1970 
5. 1809 
1.1668 
Then, for a = 0. 8 we have 
Equation: 
1.9608 1.0626 1.1589 0.8178 185. 68 
1.4051 2.1 . 8304 0. 146. 
RS = (1 — 0.8P)"!= 05 785 0.8304 0.5860) ae 6. 09 
1.1733 1.2269 2.1105 0.4893 123. 89 
0.9608 1.0626 1.1589 1.8178 100. 68 


Recall that we are dealing with expected discounted costs. Since we usually consider starting with a full 
stock, the amount y(M) = ¢y(3) = 100. 68 is the quantity of interest. Note that this is smaller than other 
values of (7), which correspond to smaller beginning stock levels. We expect greater restocking costs in 
such cases. 


4. Evolution of chains with costs and rewards 


a. No discounting A generating function analysis of 
Equation: 


(*) v(m+1)=q + Pv(m) for allm > 0, with v(0) =0 


shows 
Equation: 


v(n) =ngl + v + transients, where v = Boq 


Here g = 7q, is a column matrix of ones, Py = P®™, and Bg is a matrix which analysis also shows we 
may approximate by 
Equation: 


Bj) =B(1)+I + P + P? +---+P"— (n + 1)Po 


The largest |A;|< 1 gives an indication of how many terms are needed. 


Example: 
The inventory problem (continued) 
We consider again the inventory problem. We have 


Equation: 
0.0803 0.1839 0.3679 0.3679 86. 1668 
0.6321 0.3679 O 0 9.1970 
12) and q= 
0.2642 0.3679 0.3679 0 5. 1819 
0.0803 0.1839 0.3679 0.3679 1. 1668 


The eigenvalues are 1, 0.0920 + 20. 2434, 0.0920 — 0. 2434 and 0. Thus, the decay of the transients 
is controlled by la" (0. 0920? + 0. 2434?) "2 _ 0.2602. Since |A*|* ~ 0. 0046, the transients die 


out rather rapidly. We obtain the following approximations 
Equation: 


0.2852 0.2847 0.2632 0.1663 
0.2852 0.2847 0.2632 0.1663 
0.2852 0.2847 0.2632 0.1663 
0.2852 0.2847 0.2632 0.1663 


Py = P¥ = so that 7 ~ [0.2852 0. 2847 0. 2632 0. 1663] 


The approximate value of Bg is found to be 
Equation: 


0.4834 —0.3766 —0.1242 0.0174 
0.0299 0.7537 —0.5404 W—0. 2432 
—0.2307 —0.1684 0.7980 W—0.3989 
—0.5166 —0.3766 —0.1242 1.0174 


By ae PP eee Pe = AP ye 


The value g = 7q * 28. 80 is found in the earlier treatment. From the values above, we have 
Equation: 


37.6 28.80n + 37.6 
A 28. + 6.4 
v=Boq~ 5 so that v(n) = Sas + transients 
= 1708 28.80n — 17.8 
—47.4 28.80n — 47.4 


The average gain per period is clearly g ~ 28. 80. This soon dominates the constant terms represented 
by the entries in v. 


b. With discounting Let the discount factor be a € (0, 1). If n represents the present period, then Gpii1 = 
the reward in the first succeeding period G,,,, = the reward in the kth succeding period. If we do not 


discount the first period, then 


Equation: 

GN” = Grit taGni2 tO? Gnya to $0" "Grim = Gnu +aG ey”) 
Thus 
Equation: 


j 


M 
j=l 
In matrix form, this is 
Equation: 
v(n) =q + aPv(n-1) 


A generating function analysis shows that 


Equation: 
v”) =v; +transients 1<i<M 
Hence, the steady state part of 
Equation: 
M 4 M 
yy” =q tad piss”) is i=a ta) p(t jl <i<M 
j=l j=l 


In matrix form, 
Equation: 


v=q + aPv which yields v=(I — aP|7'q 
Since the g; = E [Gn+1|Xn = t] are known, we can solve for the v;. Also, since vl”) is the present 


value of expected gain m steps into the future, the v; represent the present value for an unlimited future, 
given that the process starts in state i. 


5. Stationary decision policies 


We suppose throughout this section that Xv is ergodic, with finite state space E = {1, 2, ---, M}. For each 
state 7 € E, there is a class A; of possible actions which may be taken when the process is in state i. A 
decision policy is a sequence of decision functions d,, dy --- such that 
Equation: 

The action at stage n is dy, (Xo, Xi, ..., Xn). 


The action selected is in A; whenever X,, = 7. We consider a special class of decision policies. Stationary 
decision policy 
Equation: 


dn (Xo, X1, +++; Xn) =d(Xp) with d invariant with n 


The possible actions depend upon the current state. That is d(X,) € A; whenever X,, = 7. Analysis shows 
the Markov character of the process is preserved under stationary policies. Also, if E is finite and every 
stationary policy produces an irreducible chain, then an optimal stationary policy is optimal. We suppose 
each policy yields an ergodic chain. Since the transition probabilities from any state depend in part on the 
action taken when in that state, the long-run probabilities will depend upon the policy. In the no-discounting 


case, we want to determine a stationary policy that maximizes the gain g = mq for the corresponding long- 
run probabilities. We say “a stationary policy,” since we do not rule out the possibility there may be more 
than one policy which maximizes the gain. In the discounting case, the goal is to maximize the steady state 
part v; in the expressions 

Equation: 


v”) =v,;,+ transients 1<i< M 


In simple cases, with small state space and a small number of possible actions in each state, it may be 
feasible to obtain the gain or present values for each permissible policy and then select the optimum by direct 
comparison. However, for most cases, this is not an efficient approach. In the next two sections, we consider 
iterative procedures for step-by-step optimization which usually greatly reduces the computational burden. 

. Policy iteration method— no discounting 
We develop an iterative procedure for finding an optimal stationary policy. The goal is to determine a 
stationary policy that maximizes the long run expected gain per period g. In the next section, we extend this 
procedure to the case where discounting is in effect. We assume that each policy yields an ergodic chain. 


Suppose we have established that under a given policy (1) v”) =q + Sop (i, joe, where 
j 


qi = E[Gn+1|Xn = t| In this case, the analysis in Sec 4 shows that for large n, (2) y”) = ng + v;, where 
g= Ss 7;q; We note that v; and g depend upon the entire policy, while q; and p(2, 7) depend on the 


4 
individual actions a;. Using (1) and (2), we obtain 
Equation: 


ng+ w= + Spi, d[(n- Ig + wl=a + (m — Ig + DS Pli, dey 


From this we conclude (3) g + vi = qi + op (i, j)v; for all 2 € E Suppose a policy d has been used. That 
j 
is, action d(2) is taken whenever the process is in state i. To simplify writing, we drop the indication of the 
action and simply write p(2, 7) for p;j (d (2)), etc. Associated with this policy, there is a gain g. We should 
like to determine whether or not this is the maximum possible gain, and if it is not, to find a policy which 
does give the maximum gain. The following two-phase procedure has been found to be effective. It is 
essentially the procedure developed originally by Ronald Howard, who pioneered the treatment of these 
problems. The new feature is the method of carrying out the value-determination step, utilizing the 
approximation method noted above. 


1. Value-determination procedure We calculate g = S Qi = 7™q. As in Sec 4, we calculate 
i 


Equation: 
v=Boqe [I + P + P? + ---+ P* — (s+1)Po|q where Po =lim P” 
2. Policy-improvement procedure We suppose policy d has been used through period n. Then, by (3), 
above, 
Equation: 


9+ WH T bac je; 
j 


We seek to improve policy d by selecting policy d*, with d* (1) = Oi, to satisfy 
Equation: 


q; + SP =max {a Gik) + Dopo ag)Ui s Gey jr ewe a}, 1<i<M 


Remarks 


o In the procedure for selecting d’, we use the “old” v; 
o It has been established that g* > g, with equality ift g is optimal. Since there is a finite number of 
policies, the procedure must converge to an optimum stationary policy in a finite number of steps. 


We implement this two step procedure with Matlab. To demonstrate the procedure, we consider the 
following 


Example: 

A numerical example 

A Markov decision process has three states: State space E = {1, 2, 3}. 
Actions: State 1: A; = {1, 2, 3} State 2: Az = {1, 2} State 3: Az = {1, 2} 
Transition probabilities and rewards are: 


pi (1): 113 13 1/3) gi; (1): [13 4] 
pi; (2): [1/4 3/8 3/8] 92; (2): [2 2 3] 
prj (3): [1/3 1/3 1/3] 93; (3): [2 2 3] 
p2; (1): [1/8 3/8 1/2] 92; (1): [2 1 2] 
po; (2): [1/2 1/4 1/4] 92; (2): [14 4] 
p3; (1): [3/8 1/4 3/8] 93; (1): [23 3] 
ps; (2): [1/8 1/4 5/8] 93; (2): [3 2 2] 


Use the policy iteration method to determine the policy which gives the maximum gain g. 

A computational procedure utilizing Matlab We first put the data in an m-file. Since we have several 
cases, it is expedient to include the case number. This example belongs to type 2. Data in file md61.m type 
= 2 PA = [Sls aye) alee aly! 8/8 Ss als) als aye © ©) O©P aves SAS wy/2s a/2 
aba alah (@) (6) @)p eyAs) a eyAsie als) ala! Byisl| Gi = |lal 6 He 2a ee 22 se © ©) 
OF 05 745r@) (POMS BPG SQoeircicors 2 al 2p al alae @ © @e 28 Se eB 2 2] A = fal 
280i 20 i 2) 

md61 type = 2 PA = 0.3333 0.3333 0.3333 0.2500 0.3750 0.3750 0.3333 
Onescon Or cscos On On OMOnI250 0 e750) OF 5000) OF 500080250070) 2500 On OnO 

On G78) Op, As) O. S78) ©.) ©2500 ©6250 GA\SLT ea 2A2ea223g0 0 8 2 
LZie@agqegaeoazegs 8s A22NSd 28 0a 2 8 a 2 


Ee the data are entered into Matlab by the call for file “md61,” we make preparation for the policy- 
improvement step. The procedure is in the file newpolprep.m 

% file newpolprep.m % version of 3/23/92 disp('Data needed:') disp(' 
Matrix PA of transition probabilities for states and actions') disp(' 
Matrix GA of gains for states and actions') disp(' Type number to 
identify GA matrix types') disp(' Type 1. Gains associated with a 
state') disp(' Type 2. One-step transition gains') disp(' Type 3. Gains 
associated with a demand') disp(' Row vector of actions') disp(' Value 
of alpha (= 1 for no discounting)') c = input('Enter type number to show 
gain type '); a = input('Enter value of alpha (= 1 for no discounting) 
"); PA = input('Enter matrix PA of transition probabilities '); GA = 
input('Enter matrix GA of gains '); if c == 3 PD = input('Enter row 
vector of demand probabilities '); end if c == 1 QA = GA'; elseif c == 
QA = diag(PA*GA'); % (qx1) else QA = GA*PD'; end A = input('Enter row 
vector A of possible actions '); % (1xq) m = length(PA(1,:)); g = 
length(A); n = input('Enter n, the power of P to approximate PO '); s = 
input('Enter s, the power of P in the approximation of V '); QD = 
[(1:q)' A' QA]; % First col is index; second is A; third is QA DIS = [' 
|Index Action Value']; disp(DIS) disp(QD) if a == 1 call = 'Call for 
newpol'; else call = 'Call for newpold'; end disp(call) newpolprep % 
Call for preparatory program in file npolprep.m Data needed: Matrix PA 
of transition probabilities for states and actions Matrix GA of gains 
for states and actions Type number to identify GA matrix types Type 1. 
Gains associated with a state Type 2. One-step transition gains Type 3. 
Gains associated with a demand Row vector of actions Value of alpha (= 1 
for no discounting) Enter type number to show gain type 2 Enter value of 
alpha (=1 for no discounting) 1 Enter matrix PA of transition 
probabilities PA Enter matrix GA of gains GA Enter row vector A of 
possible actions A Enter n, the power of P to approximate PO 16 Enter s, 
the power of P in the approximation of V 6 Index Action Value 1.0000 

fl OOOOR2 66672, 0000920000825 3750.3 000083 O000R2 333s 40000 70RG 
5.0000 1.0000 1.6250 6.0000 2.0000 2.5000 7.0000 0 0 8.0000 1.0000 
2.6250 9.0000 2.0000 2.1250 Call for newpol 

‘The procedure has prompted for the procedure newpol (in file newpol.m) 

% file: newpol.m (without discounting) % version of 3/23/92 d = 
input('Enter policy as row matrix of indices '); D = A(d); % policy in 
terms of actions P = PA(d',:); % selects probabilities for policy Q = 
QA(d',:); % selects next-period gains for policy PO = P4n; % Display to 
check convergence PI = PO(1,:); % Long-run distribution G = PI*Q % Long- 
run expected gain per period C = P + eye(P); for j=2:s C=C + PAj; %C 
= I+ P+ PA2 +... + PAs end V = (C - (st1)*PO )*Q; % B = BO*Q disp(' 
") disp('Approximation to PQ; rows are long-run dbn') disp(PO) 
disp('Policy in terms of indices') disp(d) disp('Policy in terms of 
actions') disp(D) disp('Values for the policy selected') disp(V) 
disp('Long-run expected gain per period G') disp(G) T = [(1:q)' A' (QA + 
PA*V)]; % Test values for determining new policy DIS =[' Index Action 
Test Value']; disp(DIS) disp(T) disp('Check current policy against new 
test values.') disp('--If new policy needed, call for newpol') disp('-- 
If not, use policy, values V, and gain G, above') newpol Enter policy as 
row matrix of indices [2 6 9] % A deliberately poor choice Approximation 
to PO; rows are long-run dbn 0.2642 0.2830 0.4528 0.2642 0.2830 0.4528 
0.2642 0.2830 0.4528 Policy in terms of indices 2 6 9 Policy in terms of 
factions 2 2 2 Long-run expected gain per period G 2.2972 Index Action 


Test Value 1.0000 1.0000 2.7171 2.0000 2.0000 2.4168 3.0000 3.0000 

2 Srekers) 4) (OKEXEN8) (6) 16) ts) -(6)eXOX8) al (OxelON8) al (y2748) (6) (okeKe 8) A (olelele) 2oeisy7e7/ 77 a(o}ekere) fe) (6) 
8.0000 1.0000 2.6479 9.0000 2.0000 2.0583 Check current policy against 
new test values. --If new policy needed, call for newpol --If not, use 
policy and gain G, above % New policy is needed newpol Enter policy as 
row matrix of indices [1 6 8] Approximation to PO; rows are long-run dbn 
©.3939 0.2828 0.3232 0.3939 0.2828 0.3232 0.3939 0.2828 0.3232 Policy in 
terms of indices 1 6 8 Policy in terms of actions 1 2 1 Values for 
selected policy 0.0526 -0.0989 0.0223 Long-run expected gain per period 
G 2.6061 Index Action Test Value 1.0000 1.0000 2.6587 2.0000 2.0000 
2.3595 3.0000 3.0000 2.3254 4.0000 0 © 5.0000 1.0000 1.6057 6.0000 
2.0000 2.5072 7.0000 0 0 8.0000 1.0000 2.6284 9.0000 2.0000 2.1208 Check 
current policy against new test values. --If new policy needed, call for 
newpol --If not, use policy, values V, and gain G, above 

Since the policy selected on this iteration is the same as the previous one, the procedure has converged to an 
optimal policy. The first of the first three rows is maximum; the second of the next two rows is maximum; 
and the first of the final two rows is maximum. Thus, we have selected rows 1, 5, 6, corresponding to the 
optimal policy d’ ~ (1 2 1). The expected long-run gain per period g = 2. 6061. 

The value matrix is 


Equation: 
V1 0. 0527 
V=] v2 = | —0. 0989 
U3 0.0223 


We made a somewhat arbitrary choice of the powers of P used in the approximation of Po and Bo. As we 
note in the development of the approximation procedures in Sec 4, the convergence of P” is governed by 
the magnitude of the largest eigenvalue other than one. We can always get a check on the reliability of the 
calculations by checking the eigenvalues for P corresponding to the presumed optimal policy. For the choice 
above, we find the eigenvalues to be 1, -0.1250, 0.0833. Since 0. 1254 = 0. 0002, the choices of exponents 
should be quite satisfactory. In fact, we could probably use P® as a satisfactory approximation to Pg, if that 
were desirable. The margin allows for the possibility that for some policies the eigenvalues may not be so 
small. — 


. Policy iteration— with discounting 
It turns out that the policy iteration procedure is simpler in the case of discounting, as suggested by the 
evolution analysis in Sec 4. We have the following two-phase procedure, based on that analysis. 


1. Value-determination procedure. Given the q; and transition probabilities p(z, 7) for the current policy, 
solve v = q+ aPv to get for the corresponding v; 
Equation: 


v = (|I-—aP]"'q 


2. Policy-improvement procedure Given {v; : 1 < i < M} for the current policy, determine a new 
policy d*, with each d” (1) = a;* determined as that action for which 
Equation: 


j=l 


. M M 
aap dey -mpx {n(ou) + Sopy (oadnoae a) 
Al 


We illustrate the Matlab procedure with the same numerical example as in the previous case, using discount 
factor a = 0.8. The data file is the same, so that we call for it, as before. 


Example: 

Assume data in file md61.m is in Matlab; if not, call for md61. We use the procedure newpolprep to prepare 
for the iterative procedure by making some initial choices. 

newpolprep Data needed: Matrix PA of transition probabilities for states 
and actions Matrix GA of gains for states and actions Type number to 
identify GA matrix types Type 1. Gains associated with a state Type 2. 
One-step transition gains Type 3. Gains associated with a demand Row 
vector of actions Value of alpha (= 1 for no discounting) Enter type 
number to show gain type 2 Enter value of alpha (= 1 for no discounting) 
©.8 Enter matrix PA of transition probabilities PA Enter matrix GA of 
gains GA Enter row vector A of possible actions A Enter n, the power of 
P to approximate PO 16 Enter s, the power of P in the approximation of V 
6 Index Action Test Value 1.0000 1.0000 2.6667 2.0000 2.0000 2.3750 
3.0000 3.0000 2.3333 4.0000 0 0 5.0000 1.0000 1.6250 6.0000 2.0000 
2.5000 7.0000 © 0 8.0000 1.0000 2.6250 9.0000 2.0000 2.1250 Call for 
newpold 

The procedure for policy iteration is in the file newpold.m. 

% file newpold.m (with discounting) % version of 3/23/92 d = 
input('Enter policy as row matrix of indices '); D = A(d); P = PA(d',:); 
% transition matrix for policy selected Q = QA(d',:); % average next- 
period gain for policy selected V = (eye(P) - a*P)\Q; % Present values 
for unlimited future T = [(1:q)' A' (QA + a*PA*V)]; % Test values for 
determining new policy disp(' ') disp('Approximation to PO; rows are 
long-run dbn') disp(P®) disp(' Policy in terms of indices') disp(d) 
disp(' Policy in terms of actions') disp(D) disp(' Values for policy') 
disp(V) DIS =[' Index Action Test Value']; disp(DIS) disp(T) disp('Check 
current policy against new test values.') disp('--If new policy needed, 
call for newpold') disp('--If not, use policy and values above') newpold 
Enter policy as row matrix of indices [3 5 9] % Deliberately poor choice 
Approximation to P0; rows are long-run dbn 0.3939 0.2828 0.3232 0.3939 
10.2828 0.3232 0.3939 0.2828 0.3232 Policy in terms of indices 3 5 9 
Policy in terms of actions 3 1 2 Values for policy 10.3778 9.6167 
10.1722 Index Action Test Value 1.0000 1.0000 10.7111 2.0000 2.0000 
10.3872 3.0000 3.0000 10.3778 4.0000 0 0 5.0000 1.0000 9.6167 6.0000 

2 (OKGKGN9) allo). (ekehste) 7/ ,(6NeleXe) (0) (6) 13} ,(efevaxe) al (exexexe) alle), 7/akess} ©), (lexete) 2, (3fexeke) ale). al722 
Check current policy against new test values. --If new policy needed, 
call for newpold --If not, use policy and values above newpold Enter 
policy as row matrix of indices [1 6 8] Approximation to PQ; rows are 
long-run dbn 0.3939 0.2828 0.3232 0.3939 0.2828 0.3232 0.3939 0.2828 
©.3232 Policy in terms of indices 1 6 8 Policy in terms of actions 1 2 1 
Values for policy 13.0844 12.9302 13.0519 Index Action Test Value 1.0000 
1.0000 13.0844 2.0000 2.0000 12.7865 3.0000 3.0000 12.7511 4.0000 0 0 

yn ONGKOXS) aE s(OKeKeNS) aL (eksseis} (Ce), (eleXeXe) Za texerele) ala slope 7/ ,(efeXohe) (8) (6) 18).,(alexeke) aL eleyele, 
13.0519 9.0000 2.0000 12.5455 Check current policy against new test 
values. --If new policy needed, call for newpold --If not, use policy 
and values above 

Since the policy indicated is the same as the previous policy, we know this is an optimal policy. Rows 1, 6, 
8, indicate the optimal policy to be d*~ (1, 2, 1). It turns out in this example that the optimal policies are 


the same for the discounted and undiscounted cases. That is not always true. The v matrix gives the present 
values for unlimited futures, for each of the three possible starting states. 


Queues with Poisson Arrivals, Exponential Servers 
Basic theory of one- and two-server ques with Poisson arrivals and 
exponential servers. Matlab calculations provide numerical examples. 


A standard model of a queueing system with a single waiting line and one or 
more servers assumes that “customers” arrive according to a Poisson process 
with rate (A). The customer at the head of the line goes to the first available 
server, if there are more than one, or to the single server as soon as available, 
if there is only one. The servers operate independently (of each other and the 
arrival process), each with exponential service time. We suppose each server 
has the same distribution, exponential (1). Such a system may be analyzed as 
a Markov birth-death process. An analysis of the long-run probabilities and 
expectations of various quantities after the system has settled down to 
equilibrium yields the results below. 


Calculation of these quantities is straightforward, but somewhat tedious if 
various cases are considered. Matlab procedures for single-server and two- 
server systems are utilized to make these calculations quickly and to present 
them in a useful way. 


Notation 


N+ = number in system (in service and waiting) at time t 

e (; = number waiting to be served at time t 

T = lim p;j (t) = long-run probability of being in state j 
—0o 


e W, = waiting time for service for customer who arrives at time t 
e D; = waiting time plus service time for customer who arrives at time t 


A = random variable with distribution of interarrival times 
S = random variable with distribution of service times 


Long-run probabilities 7; = P(N; = j) for large t, s servers, 
E[A] = 1/2, E[S] = 1/x 


Fors:= 1, 


+ p= EIS|/ElA] 


oe 71 = 1— pr, 


= A/u 
(1 — p)p” 


e N; is approximately geometric (1 — p) 
For s > 1, 


© p= E|S|/sE|A] = X/su 
To (sp)"/n! = m0 (A/p)"/n! O<n<s 


© L tr0 (8°/s!)p" = m0 [(sp)*/s!]p™* 8 <n 
For s = 2 
ep 2K 
aoe a Co 
Fors =3 


l=? 
err irs 
Le 2p sp 


Fors=4 


Lp 


oA) 2 
1+ 3p + 8p? + 3 p3 


For large t, with the system in equilibrium 
Equation: 


E[D;] = E[A]E[N;] and E[W;] = E[AJE[Q,] ~ E[S|E [Ny] 


© E(Q:] = pE[M:|P (Ni > 0) = p 
» 
- E[W,] = E[SJE[Ni] = ne 
p—a” 
e D,is approximately exponential (4 — A) 


Fors > 1 


(sp)° 
sl(1 — p) 


= P(W,>0) = 10 = F[Q,J—* = su(1—-pE|Wi] 


. 

P(W, > v) = Ce—s-Al¥y > 0 

P(D Bee lees 1 
e Bas 1 — = 
(De > v) =" + Cum | for A (= 1 
P(D; > v) =e" [1 + pC] for A = w(s — 1) 

El (sey p 

a oe 

: BENE Eat ts GaSe 


Qt] = To 


Matlab calculations for single server queue (in file queue1.m) 


L = input('Enter lambda '); % Type desired 
value, no extra space 
M = input('Enter mu '); % Type desired 
value, no extra space 
= [' lambda mu']; 
b = [L M]; 
disp(a) 
disp(b) 
r = L/M; % Rho 
EN = r/(1 - r); % E[N]| 
EQ = r*EN; % E[Q] 
EW = EQ/L; % E[W] 
ED = EN/L; % E[D] 


LST! rho EN EQ EW 


ED']; % Identifies entries in B 
B = [r EN EQ Ew ED]; 

disp(A) 

disp(B) 


v = input('Enter row matrix of values v 
Type matrix of desired values 


PD = exp(-M*(1 - r)*v); 
Calculates P(Dt > v) 


Sel: Vv P(D>v)']; 
Ss = [v; PD]',; 

disp(S) 

disp(s) 


Example: 


queue1 


Enter lambda 0.1 
Enter mu 0.2 
lambda mu 
© .1000 @.2000 


rho EN EQ EW 
0.5000 1.0000 0.5000 5.0000 


Enter row matrix of values v [8 16 24] 
V P(D>v) 
8.0000 0.4493 
16.0000 0.2019 
24.0000 0.0907 


eae 


% 


ED 
10.0000 


Matlab calculations for two-server queue (in file queue2.m) 


Note that the procedure will not calculate P(D > v) if A = p. 


L = input('Enter lambda ‘'); % Type desired 
value, no extra space 

M = input('Enter mu '); % Type desired 
value, no extra space 

a= [' lambda mu']; 

b = [LM]; 

disp(a) 

disp(b) 


r = L/(2*M); 


EQ = (2*rA3)/(1 - rA2); 

EN = EQ + 2*r; 

EW = EQ/L; 

ED = EN/L; 

A = [' rho EN EQ Ew ED']; % Identifies entries 
in B 

B = [r EN EQ Ew ED]; 

disp(A) 

disp(B) 


v = input('Enter row matrix of values v_ '); 


t = 2*M*EW*(1 - r)/(1 - 2*r); 
PD2 = exp(-M*v).*(1 + t.*(1 - exp(-M*v + L*v))); % 
Calculates P(D > v) for L not equal M 


S = [' v P(D>v) "J; 
s = [v; PD2]'; 
disp(S) 


disp(s) 


Example: 


queue2 


Enter lambda 0.1 


Enter mu 0.2 
lambda mu 
@.1000 0.2000 

rho EN 
@.2500 0.5333 
Enter row matrix of 
V P(D>v) 
4.0000 0.4790 
8.0000 0.2241 
16.0000 0.0473 


Comparison of single-server and two-server queues 


EQ EW 
0.0333 0.3333 
values v [4 8 16] 


ED 
5.3333 


A queueing system has Poisson arrivals, rate A and exponential (ju) service 


times. 


a. In system one, there is one server, with expected service time 1/p = 1 


minute. Determine 
Equation: 


[E[N], E[Q], E[W], E|D], and P(D > v),v =1,3,5,10 


for expected arrival rates \ = 0.6, 0.9, 0.99 customers per minute. 
b. In system two there are two servers, each with expected service time 

1/p = 2 minutes. Calculate the same quantities as for system one and 

compare the results for the two systems. 


queuel 
Enter lambda 0.6 


Enter mu 1 
lambda mu 
0.6000 1.0000 


rho EN EQ EW ED 
0.6000 1.5000 0.9000 1.5000 2.5000 


Enter row matrix of values v [1 3 5 10] 
V P(D>v) 
1.0000 0.6703 
3.0000 0.3012 
5.0000 0.1353 
0.0000 0.0183 
= ones(1, length(v)); 
R = r*Ov; % Row vector with all 


queuel 
Enter lambda 0.9 


Enter mu 1 
lambda mu 
0.9000 1.0000 


rho EN EQ EW ED 
0.9000 9.0000 8.1000 9.0000 10.0000 


Enter row matrix of values v v % Calls for 
previously entered v 
V P(D>v) 
1.0000 0.9048 
3.0000 0.7408 
5.0000 0.6065 
10.0000 0.3679 
R= r*Ov; 
r? = R: 


E12 
v1i2 


PD; 


queue 


Enter lambda 0.99 


Enter mu 
lambda 
0.9900 


rho 
0.9900 


Enter row 
Vv 

1.0000 

3.0000 

5.0000 

10.0000 

R = r*Ov; 
r3 = R; 
E13 = B; 


vi3 = 


queue2 


1 
mu 
1.0000 


EN 
99.0000 


matrix of 
P(D>v) 
0.9900 
0.9704 
0.9512 
0.9048 


second system 
Enter lambda 0.6 


Enter mu 
lambda 
0.6000 


rho 
0.6000 


0.5 
mu 
0.5000 


EN 
1.8750 


Enter row matrix of 


V 
1.0000 
3.ANAO 


P(D>v) 
0.7501 
A.32988 


EQ 
98.0100 


values v v 


EW 


ED 


99.0000 100.0000 


% Begin calculations for 


EQ 
0.6750 


values v v 


EW 
1.1250 


ED 
3.1250 


wivvvwe 


5.0000 

10.0000 
E21 = B; 
Pb, F257 FS 
v21 = PD2; 
system one 


queue2 

Enter lamb 

Enter mu 
lambda 
0.9000 


rho 
0.9000 


Enter row 
V 
1.0000 
3.0000 
5.0000 
10.0000 


E22 
V22 


B,; 
PD2; 


queue2 

Enter lamb 

Enter mu 
lambda 
0.9900 


rho 
0.9900 


wriwvvwsw 


0.2019 
0.0328 
% Not necessary to determne 
, since 
% they are the same as for 


da 0.9 
0.5 
mu 
0.5000 


EN EQ EW ED 
9.4737 7.6737 8.5263 10.5263 


matrix of values v Vv 
P(D>v) 
0.9245 
0.7749 
0.6410 
@.3916 


da 0.99 
0.5 
mu 
0.5000 


EN EQ EW ED 
99.4975 97.5175 98.5025 100.5025 


Enter row matrix of values v Vv 


V 
1.MN0O 


P(D>v) 
A.997A 


~—rewvvwes -_wwvwew 


3.0000 0.9743 
5.0000 0.9557 
10.0000 0.9094 
E23 = B; 
v23 = PD2; 


C = [E11; E21; zeros(E11); E12; E22; zeros(E11); 
E13; E23]; % Zeros are spacers 


disp(A) 
rho EN EQ EW ED 

disp(C) 
0.6000 1.5000 0.9000 1.5000 . 5000 


2 
0.6000 1.8750 0.6750 1.1250 3.1250 
0 0 0 0 0 
0.9000 9.0000 9.0000 10.0000 
0.9000 9.4737 .6737 8.5263 10.5263 
0 0 0 0 0 
0.9900 99.0000 98.0100 99.0000 100.0000 
0.9900 99.4975 97,5175 98.5025 100.5025 


NI 00 
Js 
io) 
io) 
io) 


oe rho Vv P(D1i>v)  P(D2>v)']; 
PDV = [ri r2 r3; v v v; vil vi2 v13; v21 v22 v23]|'; 


disp(H) 
rho V P(D1>v) P(D2>v) 
disp(PDV) 
1.0000 1.0000 0.6703 0.7501 
1.0000 3.0000 0.3012 0.3988 
1.0000 5.0000 0.1353 0.2019 
1.0000 10.0000 0.0183 0.0328 
0.9000 1.0000 0.9048 0.9245 
0.9000 3.0000 0.7408 0.7749 
0.9000 5.0000 0.6065 0.6410 
0.9000 10.0000 0.3679 0.3916 
0.9900 1.0000 0.9900 0.9920 
0.9900 3.0000 0.9704 0.9743 
0.9900 5.0000 0.9512 0.9557 
A.99A0A 10.AAAA A.9048 A.9094 


wvvwy —mewrtiuwvvvwes wriwvrsiwa wivvw 


